2023-11-18 01:38:34,543 INFO [train_asr.py:1183] (3/4) Training started 2023-11-18 01:38:34,544 INFO [train_asr.py:1193] (3/4) Device: cuda:3 2023-11-18 01:38:34,546 INFO [train_asr.py:1205] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': '025f11fd-dirty', 'icefall-git-date': 'Fri Nov 17 16:19:07 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-1113160712-78bc8d8bd8-pw6cd', 'IP address': '10.177.94.17'}, 'world_size': 4, 'master_port': 13454, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-18 01:38:34,546 INFO [train_asr.py:1207] (3/4) About to create model 2023-11-18 01:38:35,354 INFO [train_asr.py:1211] (3/4) Number of model parameters: 65819362 2023-11-18 01:38:38,404 INFO [train_asr.py:1227] (3/4) Using DDP 2023-11-18 01:38:39,398 INFO [train_asr.py:1271] (3/4) Getting audioset cuts 2023-11-18 01:38:39,399 INFO [kd_datamodule.py:796] (3/4) About to get the audioset cuts. 2023-11-18 01:38:39,462 INFO [train_asr.py:1277] (3/4) Using mux to combine Librispeech with audioset 2023-11-18 01:38:39,462 INFO [train_asr.py:1287] (3/4) CutSet(len=2748469) [underlying data type: ] 2023-11-18 01:38:48,544 INFO [kd_datamodule.py:396] (3/4) Enable MUSAN 2023-11-18 01:38:48,544 INFO [kd_datamodule.py:397] (3/4) About to get Musan cuts 2023-11-18 01:38:51,039 INFO [kd_datamodule.py:427] (3/4) Enable SpecAugment 2023-11-18 01:38:51,039 INFO [kd_datamodule.py:428] (3/4) Time warp factor: 80 2023-11-18 01:38:51,039 INFO [kd_datamodule.py:438] (3/4) Num frame mask: 10 2023-11-18 01:38:51,039 INFO [kd_datamodule.py:451] (3/4) About to create train dataset 2023-11-18 01:38:51,044 INFO [kd_datamodule.py:487] (3/4) Using SimpleCutSampler 2023-11-18 01:38:51,044 INFO [kd_datamodule.py:495] (3/4) About to create train dataloader 2023-11-18 01:38:51,085 INFO [kd_datamodule.py:814] (3/4) About to get the audioset eval cuts. 2023-11-18 01:38:51,122 INFO [kd_datamodule.py:529] (3/4) About to create dev dataset 2023-11-18 01:38:51,614 INFO [kd_datamodule.py:550] (3/4) About to create dev dataloader 2023-11-18 01:39:26,880 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 0, loss[loss=3.462, simple_loss=1.936, pruned_loss=1.904, audio_tagging_loss=1.335, over 14522.00 frames. ], tot_loss[loss=3.462, simple_loss=1.936, pruned_loss=1.904, audio_tagging_loss=1.335, over 14522.00 frames. ], batch size: 58, lr: 2.25e-02, grad_scale: 2.0 2023-11-18 01:39:26,881 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 01:39:55,362 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.0163, 6.1027, 6.0773, 6.0335], device='cuda:3') 2023-11-18 01:39:56,099 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9094, 6.0802, 6.0499, 5.9986], device='cuda:3') 2023-11-18 01:40:00,446 INFO [train_asr.py:1147] (3/4) Epoch 1, validation: loss=2.927, simple_loss=1.349, pruned_loss=1.339, audio_tagging_loss=1.444, over 4681554.00 frames. 2023-11-18 01:40:00,446 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 01:40:01,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.01 vs. limit=7.5 2023-11-18 01:40:16,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=29.83 vs. limit=5.033333333333333 2023-11-18 01:40:24,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=66.66666666666667, ans=0.49166666666666664 2023-11-18 01:40:24,748 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=60.77 vs. limit=7.525 2023-11-18 01:40:28,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133.33333333333334, ans=0.29866666666666664 2023-11-18 01:40:32,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=382.79 vs. limit=7.6 2023-11-18 01:40:37,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=115.31 vs. limit=7.6 2023-11-18 01:40:40,987 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=480.88 vs. limit=7.55 2023-11-18 01:40:43,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=200.0, ans=0.1925 2023-11-18 01:40:49,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=200.0, ans=0.490625 2023-11-18 01:41:06,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=192.35 vs. limit=7.6 2023-11-18 01:41:09,568 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 50, loss[loss=0.3831, simple_loss=0.2956, pruned_loss=0.3398, audio_tagging_loss=0.04952, over 14275.00 frames. ], tot_loss[loss=1.359, simple_loss=1.009, pruned_loss=0.8645, audio_tagging_loss=0.2608, over 685538.52 frames. ], batch size: 56, lr: 2.48e-02, grad_scale: 1.0 2023-11-18 01:41:15,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=333.3333333333333, ans=0.1875 2023-11-18 01:41:31,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=4.16 2023-11-18 01:41:32,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=400.0, ans=0.091 2023-11-18 01:41:36,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=3.07 2023-11-18 01:41:36,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=82.27 vs. limit=7.85 2023-11-18 01:41:38,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=230.08 vs. limit=7.675 2023-11-18 01:41:41,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=41.34 vs. limit=4.1866666666666665 2023-11-18 01:41:44,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=421.62 vs. limit=7.675 2023-11-18 01:41:53,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=533.3333333333334, ans=0.8813333333333333 2023-11-18 01:41:55,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533.3333333333334, ans=0.29466666666666663 2023-11-18 01:41:58,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=29.91 vs. limit=5.133333333333334 2023-11-18 01:42:18,211 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 100, loss[loss=0.4627, simple_loss=0.3766, pruned_loss=0.4553, audio_tagging_loss=0.02935, over 15575.00 frames. ], tot_loss[loss=0.8865, simple_loss=0.6778, pruned_loss=0.6515, audio_tagging_loss=0.1376, over 1208020.15 frames. ], batch size: 57, lr: 2.70e-02, grad_scale: 2.0 2023-11-18 01:42:19,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 4.039e+01 1.213e+02 5.684e+02 1.606e+03 1.428e+04, threshold=1.137e+03, percent-clipped=0.0 2023-11-18 01:42:28,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=70.84 vs. limit=7.75 2023-11-18 01:42:30,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=733.3333333333334, ans=0.4083333333333333 2023-11-18 01:42:30,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=89.61 vs. limit=7.775 2023-11-18 01:42:33,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=180.52 vs. limit=8.05 2023-11-18 01:42:35,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=16.76 vs. limit=5.183333333333334 2023-11-18 01:42:36,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=733.3333333333334, ans=7.775 2023-11-18 01:42:37,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=340.91 vs. limit=7.775 2023-11-18 01:42:38,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.78 vs. limit=5.183333333333334 2023-11-18 01:42:44,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=800.0, ans=0.4625 2023-11-18 01:42:51,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=4.32 2023-11-18 01:42:52,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=199.38 vs. limit=8.1 2023-11-18 01:43:03,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=98.77 vs. limit=7.825 2023-11-18 01:43:06,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=4.346666666666667 2023-11-18 01:43:08,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=317.78 vs. limit=8.15 2023-11-18 01:43:09,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=126.70 vs. limit=8.15 2023-11-18 01:43:18,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=154.01 vs. limit=8.2 2023-11-18 01:43:21,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=102.27 vs. limit=7.85 2023-11-18 01:43:22,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=933.3333333333334, ans=8.2 2023-11-18 01:43:24,596 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 150, loss[loss=0.403, simple_loss=0.3283, pruned_loss=0.4028, audio_tagging_loss=0.01947, over 14566.00 frames. ], tot_loss[loss=0.7023, simple_loss=0.5449, pruned_loss=0.5649, audio_tagging_loss=0.0916, over 1614055.29 frames. ], batch size: 54, lr: 2.93e-02, grad_scale: 2.0 2023-11-18 01:43:34,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=102.92 vs. limit=7.875 2023-11-18 01:43:38,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=61.30 vs. limit=7.9 2023-11-18 01:43:41,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=89.95 vs. limit=5.533333333333333 2023-11-18 01:43:43,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1066.6666666666667, ans=0.45 2023-11-18 01:43:57,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=1133.3333333333333, ans=0.1575 2023-11-18 01:44:09,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1200.0, ans=0.288 2023-11-18 01:44:12,418 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=207.00 vs. limit=7.95 2023-11-18 01:44:14,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=100.32 vs. limit=7.95 2023-11-18 01:44:17,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=330.45 vs. limit=7.975 2023-11-18 01:44:18,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=1266.6666666666667, ans=0.28733333333333333 2023-11-18 01:44:21,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=19.40 vs. limit=5.316666666666666 2023-11-18 01:44:23,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=54.85 vs. limit=7.975 2023-11-18 01:44:25,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=97.37 vs. limit=7.975 2023-11-18 01:44:28,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.84 vs. limit=8.45 2023-11-18 01:44:32,260 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 200, loss[loss=0.4636, simple_loss=0.3772, pruned_loss=0.4528, audio_tagging_loss=0.01913, over 15079.00 frames. ], tot_loss[loss=0.5949, simple_loss=0.4653, pruned_loss=0.5027, audio_tagging_loss=0.06745, over 1926368.69 frames. ], batch size: 55, lr: 3.15e-02, grad_scale: 4.0 2023-11-18 01:44:33,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.394e+01 4.484e+01 5.110e+01 6.274e+01 1.485e+02, threshold=1.022e+02, percent-clipped=0.0 2023-11-18 01:44:39,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=79.84 vs. limit=5.0 2023-11-18 01:44:46,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=105.08 vs. limit=8.025 2023-11-18 01:44:49,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=77.72 vs. limit=8.55 2023-11-18 01:44:52,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=199.79 vs. limit=8.025 2023-11-18 01:45:00,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=1466.6666666666667, ans=0.2853333333333333 2023-11-18 01:45:09,452 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 01:45:10,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=70.29 vs. limit=8.05 2023-11-18 01:45:11,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=107.73 vs. limit=8.05 2023-11-18 01:45:15,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=1533.3333333333333, ans=0.035 2023-11-18 01:45:16,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=132.89 vs. limit=8.65 2023-11-18 01:45:18,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=1533.3333333333333, ans=0.0655 2023-11-18 01:45:18,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=4.613333333333333 2023-11-18 01:45:19,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1533.3333333333333, ans=0.428125 2023-11-18 01:45:22,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=14.90 vs. limit=5.383333333333333 2023-11-18 01:45:25,176 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=47.28 vs. limit=8.7 2023-11-18 01:45:34,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=1600.0, ans=0.14 2023-11-18 01:45:37,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=244.00 vs. limit=8.1 2023-11-18 01:45:39,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=57.96 vs. limit=8.7 2023-11-18 01:45:39,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1666.6666666666667, ans=0.421875 2023-11-18 01:45:41,047 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 250, loss[loss=0.3652, simple_loss=0.2884, pruned_loss=0.3387, audio_tagging_loss=0.02276, over 13991.00 frames. ], tot_loss[loss=0.5276, simple_loss=0.4148, pruned_loss=0.4579, audio_tagging_loss=0.0528, over 2172163.16 frames. ], batch size: 53, lr: 3.38e-02, grad_scale: 4.0 2023-11-18 01:45:41,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=1666.6666666666667, ans=0.0625 2023-11-18 01:45:44,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=160.53 vs. limit=8.75 2023-11-18 01:45:45,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=155.70 vs. limit=5.833333333333333 2023-11-18 01:45:48,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=77.75 vs. limit=8.125 2023-11-18 01:45:50,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=4.666666666666667 2023-11-18 01:46:00,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=1733.3333333333333, ans=0.18396666666666667 2023-11-18 01:46:00,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1733.3333333333333, ans=0.2826666666666667 2023-11-18 01:46:07,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=1800.0, ans=0.415625 2023-11-18 01:46:14,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=1800.0, ans=0.837 2023-11-18 01:46:14,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=52.46 vs. limit=5.9 2023-11-18 01:46:15,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=1800.0, ans=0.227 2023-11-18 01:46:19,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=1866.6666666666667, ans=0.26666666666666666 2023-11-18 01:46:21,091 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.090e-02 2023-11-18 01:46:22,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1866.6666666666667, ans=0.23133333333333334 2023-11-18 01:46:37,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=52.31 vs. limit=5.966666666666667 2023-11-18 01:46:40,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.69 vs. limit=8.95 2023-11-18 01:46:42,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=1933.3333333333333, ans=8.225 2023-11-18 01:46:45,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=33.16 vs. limit=8.225 2023-11-18 01:46:46,670 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 300, loss[loss=0.3182, simple_loss=0.2482, pruned_loss=0.2892, audio_tagging_loss=0.02058, over 14494.00 frames. ], tot_loss[loss=0.4868, simple_loss=0.3839, pruned_loss=0.4295, audio_tagging_loss=0.04313, over 2367299.45 frames. ], batch size: 57, lr: 3.60e-02, grad_scale: 8.0 2023-11-18 01:46:47,927 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.566e+01 4.754e+01 5.461e+01 6.771e+01 2.069e+02, threshold=1.092e+02, percent-clipped=3.0 2023-11-18 01:46:50,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=2000.0, ans=0.25 2023-11-18 01:46:52,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2000.0, ans=0.27999999999999997 2023-11-18 01:46:55,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.49 vs. limit=9.0 2023-11-18 01:46:58,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.34 vs. limit=5.516666666666667 2023-11-18 01:47:01,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=119.99 vs. limit=8.275 2023-11-18 01:47:09,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=74.23 vs. limit=9.05 2023-11-18 01:47:09,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=62.51 vs. limit=8.275 2023-11-18 01:47:17,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=69.85 vs. limit=8.3 2023-11-18 01:47:24,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=53.71 vs. limit=8.3 2023-11-18 01:47:25,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2200.0, ans=0.050499999999999996 2023-11-18 01:47:25,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=108.43 vs. limit=9.15 2023-11-18 01:47:30,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=2200.0, ans=0.08625000000000001 2023-11-18 01:47:31,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=5.55 2023-11-18 01:47:37,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2266.6666666666665, ans=0.39375 2023-11-18 01:47:39,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=50.48 vs. limit=8.35 2023-11-18 01:47:39,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=79.13 vs. limit=8.35 2023-11-18 01:47:39,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=54.58 vs. limit=8.35 2023-11-18 01:47:41,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=42.07 vs. limit=8.35 2023-11-18 01:47:49,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=70.80 vs. limit=8.35 2023-11-18 01:47:51,176 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 350, loss[loss=0.4072, simple_loss=0.3211, pruned_loss=0.37, audio_tagging_loss=0.01882, over 15001.00 frames. ], tot_loss[loss=0.4542, simple_loss=0.3577, pruned_loss=0.4037, audio_tagging_loss=0.03684, over 2514517.51 frames. ], batch size: 57, lr: 3.83e-02, grad_scale: 8.0 2023-11-18 01:47:53,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=2333.3333333333335, ans=0.390625 2023-11-18 01:47:59,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=5.583333333333333 2023-11-18 01:48:07,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.59 vs. limit=9.3 2023-11-18 01:48:08,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=67.32 vs. limit=8.4 2023-11-18 01:48:10,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=117.17 vs. limit=8.4 2023-11-18 01:48:13,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2400.0, ans=0.27599999999999997 2023-11-18 01:48:15,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=30.21 vs. limit=9.3 2023-11-18 01:48:18,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=27.91 vs. limit=6.233333333333333 2023-11-18 01:48:21,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.44 vs. limit=9.35 2023-11-18 01:48:24,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=54.20 vs. limit=8.425 2023-11-18 01:48:27,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=44.48 vs. limit=9.35 2023-11-18 01:48:53,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=2600.0, ans=0.1025 2023-11-18 01:48:56,372 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=105.17 vs. limit=9.5 2023-11-18 01:48:57,494 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 400, loss[loss=0.4968, simple_loss=0.3927, pruned_loss=0.4575, audio_tagging_loss=0.01516, over 15330.00 frames. ], tot_loss[loss=0.4317, simple_loss=0.3391, pruned_loss=0.385, audio_tagging_loss=0.03228, over 2636711.04 frames. ], batch size: 55, lr: 4.05e-02, grad_scale: 16.0 2023-11-18 01:48:58,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.943e+01 5.270e+01 6.183e+01 8.354e+01 3.927e+02, threshold=1.237e+02, percent-clipped=8.0 2023-11-18 01:49:05,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=5.066666666666666 2023-11-18 01:49:09,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=2733.3333333333335, ans=0.371875 2023-11-18 01:49:09,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2733.3333333333335, ans=0.0975 2023-11-18 01:49:09,375 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=81.07 vs. limit=8.525 2023-11-18 01:49:13,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.75 vs. limit=5.683333333333334 2023-11-18 01:49:15,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=79.98 vs. limit=8.525 2023-11-18 01:49:22,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=2800.0, ans=0.037000000000000005 2023-11-18 01:49:25,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=51.56 vs. limit=8.55 2023-11-18 01:49:33,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2800.0, ans=0.36875 2023-11-18 01:49:33,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=5.12 2023-11-18 01:49:37,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=80.81 vs. limit=8.575 2023-11-18 01:49:41,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.75 vs. limit=6.433333333333334 2023-11-18 01:49:42,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=2866.6666666666665, ans=0.7996666666666667 2023-11-18 01:49:42,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=27.50 vs. limit=9.65 2023-11-18 01:49:47,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=36.09 vs. limit=8.6 2023-11-18 01:49:52,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=2933.3333333333335, ans=0.08999999999999998 2023-11-18 01:49:52,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=61.83 vs. limit=8.6 2023-11-18 01:49:55,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=127.04 vs. limit=8.6 2023-11-18 01:50:00,765 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 450, loss[loss=0.4106, simple_loss=0.3206, pruned_loss=0.3713, audio_tagging_loss=0.01369, over 15544.00 frames. ], tot_loss[loss=0.4157, simple_loss=0.3255, pruned_loss=0.3705, audio_tagging_loss=0.02866, over 2729537.52 frames. ], batch size: 58, lr: 4.28e-02, grad_scale: 16.0 2023-11-18 01:50:01,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3000.0, ans=0.245 2023-11-18 01:50:02,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=45.80 vs. limit=8.625 2023-11-18 01:50:05,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=97.24 vs. limit=8.625 2023-11-18 01:50:07,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=38.54 vs. limit=9.75 2023-11-18 01:50:10,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3000.0, ans=0.125 2023-11-18 01:50:11,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3000.0, ans=9.75 2023-11-18 01:50:16,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.42 vs. limit=5.766666666666667 2023-11-18 01:50:16,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=5.766666666666667 2023-11-18 01:50:28,640 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=65.12 vs. limit=8.675 2023-11-18 01:50:28,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=9.85 2023-11-18 01:50:30,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.46 vs. limit=8.675 2023-11-18 01:50:34,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=31.62 vs. limit=9.85 2023-11-18 01:50:41,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3200.0, ans=0.35 2023-11-18 01:50:47,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=5.28 2023-11-18 01:50:52,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3266.6666666666665, ans=0.7856666666666667 2023-11-18 01:50:56,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3266.6666666666665, ans=0.346875 2023-11-18 01:50:58,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3266.6666666666665, ans=0.346875 2023-11-18 01:50:59,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3266.6666666666665, ans=7.041666666666666 2023-11-18 01:51:05,766 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 500, loss[loss=0.2935, simple_loss=0.2219, pruned_loss=0.2459, audio_tagging_loss=0.01936, over 15479.00 frames. ], tot_loss[loss=0.403, simple_loss=0.3143, pruned_loss=0.3577, audio_tagging_loss=0.02597, over 2802423.79 frames. ], batch size: 59, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:51:06,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=48.86 vs. limit=8.75 2023-11-18 01:51:06,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 3.968e+01 4.922e+01 5.274e+01 6.306e+01 1.338e+02, threshold=1.055e+02, percent-clipped=1.0 2023-11-18 01:51:12,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=3.5 2023-11-18 01:51:13,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=57.64 vs. limit=8.75 2023-11-18 01:51:13,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=25.01 vs. limit=8.75 2023-11-18 01:51:16,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.41 vs. limit=8.75 2023-11-18 01:51:17,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3400.0, ans=0.266 2023-11-18 01:51:24,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=8.775 2023-11-18 01:51:26,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.02 vs. limit=6.7 2023-11-18 01:51:27,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=120.97 vs. limit=8.775 2023-11-18 01:51:29,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.75 vs. limit=5.85 2023-11-18 01:51:42,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=54.55 vs. limit=8.825 2023-11-18 01:51:42,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.31 vs. limit=10.15 2023-11-18 01:51:49,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=60.35 vs. limit=8.825 2023-11-18 01:51:52,392 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=50.15 vs. limit=8.825 2023-11-18 01:52:03,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=5.4399999999999995 2023-11-18 01:52:09,206 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 550, loss[loss=0.4053, simple_loss=0.3071, pruned_loss=0.3417, audio_tagging_loss=0.02161, over 15161.00 frames. ], tot_loss[loss=0.3919, simple_loss=0.3041, pruned_loss=0.3451, audio_tagging_loss=0.02411, over 2848105.93 frames. ], batch size: 56, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:52:16,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3666.6666666666665, ans=0.2633333333333333 2023-11-18 01:52:17,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=25.15 vs. limit=8.875 2023-11-18 01:52:24,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=32.96 vs. limit=8.9 2023-11-18 01:52:33,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=33.36 vs. limit=8.925 2023-11-18 01:52:44,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=8.925 2023-11-18 01:52:46,525 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=64.38 vs. limit=8.95 2023-11-18 01:52:47,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=45.09 vs. limit=8.95 2023-11-18 01:52:49,773 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=2.025e+02 2023-11-18 01:52:50,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=8.95 2023-11-18 01:53:05,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=29.59 vs. limit=8.975 2023-11-18 01:53:07,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3933.3333333333335, ans=8.975 2023-11-18 01:53:08,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3933.3333333333335, ans=0.05249999999999999 2023-11-18 01:53:12,517 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 600, loss[loss=0.3671, simple_loss=0.2767, pruned_loss=0.3045, audio_tagging_loss=0.01919, over 16005.00 frames. ], tot_loss[loss=0.3877, simple_loss=0.2994, pruned_loss=0.3379, audio_tagging_loss=0.02261, over 2896829.31 frames. ], batch size: 60, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:53:13,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 4.102e+01 5.824e+01 6.784e+01 8.267e+01 3.333e+02, threshold=1.357e+02, percent-clipped=4.0 2023-11-18 01:53:20,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4000.0, ans=0.04999999999999999 2023-11-18 01:53:25,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=9.025 2023-11-18 01:53:31,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=4066.6666666666665, ans=5.626666666666667 2023-11-18 01:53:37,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=23.33 vs. limit=9.05 2023-11-18 01:53:43,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=17.27 vs. limit=9.05 2023-11-18 01:53:45,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=10.6 2023-11-18 01:53:48,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=9.05 2023-11-18 01:53:51,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.95 vs. limit=6.05 2023-11-18 01:53:54,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4200.0, ans=0.009956521739130435 2023-11-18 01:53:54,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4200.0, ans=0.303125 2023-11-18 01:53:56,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=36.95 vs. limit=9.075 2023-11-18 01:54:16,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=9.125 2023-11-18 01:54:16,776 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 650, loss[loss=0.4961, simple_loss=0.3836, pruned_loss=0.412, audio_tagging_loss=0.01306, over 14565.00 frames. ], tot_loss[loss=0.3825, simple_loss=0.2941, pruned_loss=0.3292, audio_tagging_loss=0.02141, over 2930754.18 frames. ], batch size: 52, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:54:20,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4333.333333333333, ans=0.0 2023-11-18 01:54:21,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4333.333333333333, ans=0.04861111111111111 2023-11-18 01:54:29,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=48.83 vs. limit=9.15 2023-11-18 01:54:34,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.75 vs. limit=6.1 2023-11-18 01:54:38,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4400.0, ans=0.04833333333333334 2023-11-18 01:54:53,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.91 vs. limit=10.9 2023-11-18 01:54:56,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=9.2 2023-11-18 01:55:00,023 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.54 vs. limit=6.133333333333333 2023-11-18 01:55:02,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4533.333333333333, ans=0.2875 2023-11-18 01:55:03,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=10.9 2023-11-18 01:55:05,540 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=10.9 2023-11-18 01:55:06,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=9.225 2023-11-18 01:55:08,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=10.95 2023-11-18 01:55:10,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=9.225 2023-11-18 01:55:13,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4600.0, ans=0.035625000000000004 2023-11-18 01:55:13,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.92 vs. limit=10.95 2023-11-18 01:55:15,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4600.0, ans=0.739 2023-11-18 01:55:16,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=38.61 vs. limit=9.225 2023-11-18 01:55:19,023 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 700, loss[loss=0.387, simple_loss=0.2943, pruned_loss=0.309, audio_tagging_loss=0.01589, over 14229.00 frames. ], tot_loss[loss=0.3797, simple_loss=0.2908, pruned_loss=0.3221, audio_tagging_loss=0.02042, over 2954687.27 frames. ], batch size: 55, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:55:20,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.747e+01 8.200e+01 9.584e+01 1.192e+02 3.813e+02, threshold=1.917e+02, percent-clipped=10.0 2023-11-18 01:55:29,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4666.666666666667, ans=0.009855072463768115 2023-11-18 01:55:35,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.21 vs. limit=3.71 2023-11-18 01:55:46,290 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=20.22 vs. limit=9.3 2023-11-18 01:55:49,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=20.25 vs. limit=9.3 2023-11-18 01:55:49,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=9.3 2023-11-18 01:55:50,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=34.99 vs. limit=9.3 2023-11-18 01:55:55,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=28.46 vs. limit=9.3 2023-11-18 01:56:00,143 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=9.325 2023-11-18 01:56:02,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4866.666666666667, ans=0.2513333333333333 2023-11-18 01:56:02,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=9.325 2023-11-18 01:56:08,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=30.48 vs. limit=9.35 2023-11-18 01:56:09,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4933.333333333333, ans=0.7273333333333334 2023-11-18 01:56:15,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4933.333333333333, ans=0.0 2023-11-18 01:56:16,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.91 vs. limit=6.233333333333333 2023-11-18 01:56:19,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4933.333333333333, ans=0.26875 2023-11-18 01:56:19,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=24.94 vs. limit=9.35 2023-11-18 01:56:21,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=9.375 2023-11-18 01:56:21,541 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 750, loss[loss=0.3825, simple_loss=0.2906, pruned_loss=0.3035, audio_tagging_loss=0.01351, over 13745.00 frames. ], tot_loss[loss=0.3787, simple_loss=0.289, pruned_loss=0.3163, audio_tagging_loss=0.01973, over 2972669.45 frames. ], batch size: 52, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:56:24,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5000.0, ans=0.25 2023-11-18 01:56:37,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5066.666666666667, ans=0.2625 2023-11-18 01:56:37,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=26.87 vs. limit=9.4 2023-11-18 01:56:47,181 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.146e+01 2023-11-18 01:56:48,721 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=9.425 2023-11-18 01:57:00,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=9.45 2023-11-18 01:57:04,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5200.0, ans=0.25625 2023-11-18 01:57:05,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5200.0, ans=8.25 2023-11-18 01:57:11,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=28.46 vs. limit=9.475 2023-11-18 01:57:18,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5266.666666666667, ans=0.253125 2023-11-18 01:57:24,939 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 800, loss[loss=0.4453, simple_loss=0.3383, pruned_loss=0.3408, audio_tagging_loss=0.0178, over 15401.00 frames. ], tot_loss[loss=0.3753, simple_loss=0.2854, pruned_loss=0.3078, audio_tagging_loss=0.01935, over 2992266.65 frames. ], batch size: 59, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:57:26,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=11.5 2023-11-18 01:57:27,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.780e+01 1.132e+02 1.440e+02 3.329e+02, threshold=2.265e+02, percent-clipped=7.0 2023-11-18 01:57:27,949 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=6.333333333333333 2023-11-18 01:57:28,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=9.5 2023-11-18 01:57:34,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.09 vs. limit=6.333333333333333 2023-11-18 01:57:43,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=5400.0, ans=8.375 2023-11-18 01:57:50,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=9.55 2023-11-18 01:57:57,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=5466.666666666667, ans=0.04949747468305833 2023-11-18 01:58:15,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.54 vs. limit=7.8 2023-11-18 01:58:25,505 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 850, loss[loss=0.3864, simple_loss=0.2964, pruned_loss=0.289, audio_tagging_loss=0.01355, over 15254.00 frames. ], tot_loss[loss=0.369, simple_loss=0.2801, pruned_loss=0.2965, audio_tagging_loss=0.01902, over 3004468.79 frames. ], batch size: 56, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:58:31,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5666.666666666667, ans=0.234375 2023-11-18 01:58:45,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=5733.333333333333, ans=0.04949747468305833 2023-11-18 01:58:54,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.52 vs. limit=7.9 2023-11-18 01:59:12,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5866.666666666667, ans=0.24133333333333332 2023-11-18 01:59:13,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=9.725 2023-11-18 01:59:17,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.97 vs. limit=9.725 2023-11-18 01:59:18,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=5933.333333333333, ans=0.221875 2023-11-18 01:59:21,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.94 vs. limit=11.95 2023-11-18 01:59:21,463 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=9.725 2023-11-18 01:59:26,565 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 900, loss[loss=0.398, simple_loss=0.3084, pruned_loss=0.2836, audio_tagging_loss=0.01564, over 16041.00 frames. ], tot_loss[loss=0.3614, simple_loss=0.2747, pruned_loss=0.2834, audio_tagging_loss=0.01867, over 3015770.74 frames. ], batch size: 58, lr: 4.48e-02, grad_scale: 16.0 2023-11-18 01:59:28,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=6000.0, ans=0.009565217391304347 2023-11-18 01:59:28,875 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.579e+01 7.921e+01 9.785e+01 1.252e+02 2.736e+02, threshold=1.957e+02, percent-clipped=4.0 2023-11-18 01:59:30,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.04 vs. limit=9.75 2023-11-18 01:59:38,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.65 vs. limit=12.0 2023-11-18 01:59:41,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=12.05 2023-11-18 01:59:48,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=6066.666666666667, ans=0.04138888888888889 2023-11-18 01:59:48,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=23.34 vs. limit=9.775 2023-11-18 01:59:49,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=12.05 2023-11-18 01:59:53,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.97 vs. limit=8.066666666666666 2023-11-18 02:00:02,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=12.15 2023-11-18 02:00:14,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=6266.666666666667, ans=0.009507246376811595 2023-11-18 02:00:15,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.36 vs. limit=8.133333333333333 2023-11-18 02:00:17,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=6266.666666666667, ans=0.030416666666666668 2023-11-18 02:00:18,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.02 vs. limit=12.2 2023-11-18 02:00:26,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=9.85 2023-11-18 02:00:27,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.67 vs. limit=9.875 2023-11-18 02:00:28,104 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 950, loss[loss=0.3259, simple_loss=0.2556, pruned_loss=0.2175, audio_tagging_loss=0.01623, over 14809.00 frames. ], tot_loss[loss=0.35, simple_loss=0.2669, pruned_loss=0.2676, audio_tagging_loss=0.0181, over 3019680.05 frames. ], batch size: 55, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:00:47,908 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.664e+01 2023-11-18 02:00:57,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=6466.666666666667, ans=0.03972222222222222 2023-11-18 02:00:57,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.48 vs. limit=12.35 2023-11-18 02:01:09,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=6533.333333333333, ans=0.009449275362318842 2023-11-18 02:01:18,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6600.0, ans=0.23399999999999999 2023-11-18 02:01:27,318 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1000, loss[loss=0.3315, simple_loss=0.2632, pruned_loss=0.221, audio_tagging_loss=0.01259, over 14439.00 frames. ], tot_loss[loss=0.3429, simple_loss=0.2633, pruned_loss=0.2551, audio_tagging_loss=0.01731, over 3029875.32 frames. ], batch size: 54, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:01:28,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=6666.666666666667, ans=0.009420289855072464 2023-11-18 02:01:30,669 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.563e+01 9.019e+01 1.486e+02 2.475e+02 7.919e+02, threshold=2.973e+02, percent-clipped=36.0 2023-11-18 02:01:30,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=6666.666666666667, ans=0.1875 2023-11-18 02:01:53,966 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:01:54,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=6800.0, ans=0.0 2023-11-18 02:01:55,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=12.6 2023-11-18 02:02:06,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=6866.666666666667, ans=0.03805555555555556 2023-11-18 02:02:20,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=10.1 2023-11-18 02:02:26,184 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1050, loss[loss=0.2344, simple_loss=0.1817, pruned_loss=0.1437, audio_tagging_loss=0.01814, over 14458.00 frames. ], tot_loss[loss=0.331, simple_loss=0.2555, pruned_loss=0.2393, audio_tagging_loss=0.01711, over 3031097.25 frames. ], batch size: 55, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:02:26,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=7000.0, ans=0.009347826086956522 2023-11-18 02:02:35,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7000.0, ans=0.22999999999999998 2023-11-18 02:02:42,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=6.826666666666666 2023-11-18 02:02:59,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=10.175 2023-11-18 02:03:16,162 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:03:20,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=7266.666666666667, ans=0.6456666666666666 2023-11-18 02:03:20,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=7266.666666666667, ans=0.159375 2023-11-18 02:03:25,426 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1100, loss[loss=0.2444, simple_loss=0.1923, pruned_loss=0.1442, audio_tagging_loss=0.01912, over 15044.00 frames. ], tot_loss[loss=0.3203, simple_loss=0.2491, pruned_loss=0.2251, audio_tagging_loss=0.01659, over 3039672.64 frames. ], batch size: 56, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:03:27,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=13.0 2023-11-18 02:03:28,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 1.098e+02 1.778e+02 2.963e+02 6.822e+02, threshold=3.557e+02, percent-clipped=25.0 2023-11-18 02:03:28,833 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:03:36,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=7400.0, ans=10.275 2023-11-18 02:03:37,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=10.275 2023-11-18 02:03:41,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.34 vs. limit=6.85 2023-11-18 02:03:43,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=7400.0, ans=0.153125 2023-11-18 02:04:00,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=7533.333333333333, ans=0.05291666666666667 2023-11-18 02:04:10,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.21 vs. limit=10.35 2023-11-18 02:04:22,644 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1150, loss[loss=0.2654, simple_loss=0.2151, pruned_loss=0.1553, audio_tagging_loss=0.01616, over 14716.00 frames. ], tot_loss[loss=0.3089, simple_loss=0.242, pruned_loss=0.2109, audio_tagging_loss=0.01621, over 3040160.17 frames. ], batch size: 56, lr: 4.47e-02, grad_scale: 8.0 2023-11-18 02:04:27,412 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.91 vs. limit=8.833333333333334 2023-11-18 02:04:28,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=7666.666666666667, ans=0.6316666666666666 2023-11-18 02:04:33,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=13.3 2023-11-18 02:04:42,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.43 vs. limit=6.933333333333334 2023-11-18 02:05:01,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=7866.666666666667, ans=0.13124999999999998 2023-11-18 02:05:02,513 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=13.4 2023-11-18 02:05:20,093 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1200, loss[loss=0.2666, simple_loss=0.2181, pruned_loss=0.1567, audio_tagging_loss=0.01358, over 15105.00 frames. ], tot_loss[loss=0.3, simple_loss=0.2368, pruned_loss=0.1993, audio_tagging_loss=0.01586, over 3030286.56 frames. ], batch size: 56, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:05:23,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 1.072e+02 1.842e+02 2.807e+02 8.662e+02, threshold=3.683e+02, percent-clipped=14.0 2023-11-18 02:05:33,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=4.21 2023-11-18 02:05:45,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.70 vs. limit=9.066666666666666 2023-11-18 02:05:50,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=10.55 2023-11-18 02:05:56,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=10.575 2023-11-18 02:06:10,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8266.666666666666, ans=0.21733333333333332 2023-11-18 02:06:17,061 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1250, loss[loss=0.2499, simple_loss=0.2072, pruned_loss=0.1371, audio_tagging_loss=0.01662, over 15502.00 frames. ], tot_loss[loss=0.2878, simple_loss=0.2284, pruned_loss=0.1861, audio_tagging_loss=0.01584, over 3029412.37 frames. ], batch size: 56, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:06:32,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=7.359999999999999 2023-11-18 02:06:44,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=13.85 2023-11-18 02:06:47,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=8466.666666666666, ans=0.0 2023-11-18 02:06:51,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=8533.333333333334, ans=0.03111111111111111 2023-11-18 02:06:55,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=8533.333333333334, ans=0.009014492753623189 2023-11-18 02:06:57,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=8533.333333333334, ans=0.05 2023-11-18 02:07:08,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.93 vs. limit=9.3 2023-11-18 02:07:10,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.88 vs. limit=13.95 2023-11-18 02:07:13,972 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1300, loss[loss=0.2561, simple_loss=0.2104, pruned_loss=0.1487, audio_tagging_loss=0.01223, over 14681.00 frames. ], tot_loss[loss=0.277, simple_loss=0.2213, pruned_loss=0.1745, audio_tagging_loss=0.01576, over 3029323.85 frames. ], batch size: 54, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:07:17,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.700e+01 1.001e+02 1.539e+02 2.707e+02 8.460e+02, threshold=3.079e+02, percent-clipped=10.0 2023-11-18 02:07:17,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=10.75 2023-11-18 02:07:19,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=8666.666666666666, ans=0.125 2023-11-18 02:07:19,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=8666.666666666666, ans=0.32999999999999996 2023-11-18 02:07:26,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=8733.333333333334, ans=0.030277777777777775 2023-11-18 02:07:32,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=10.775 2023-11-18 02:07:34,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8733.333333333334, ans=0.21266666666666667 2023-11-18 02:07:36,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8800.0, ans=0.212 2023-11-18 02:07:50,114 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.21 vs. limit=9.433333333333334 2023-11-18 02:08:02,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=10.85 2023-11-18 02:08:05,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=10.85 2023-11-18 02:08:07,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=8933.333333333334, ans=0.029444444444444443 2023-11-18 02:08:10,226 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1350, loss[loss=0.2022, simple_loss=0.1627, pruned_loss=0.1076, audio_tagging_loss=0.01833, over 15632.00 frames. ], tot_loss[loss=0.2691, simple_loss=0.2165, pruned_loss=0.1652, audio_tagging_loss=0.01569, over 3031378.84 frames. ], batch size: 59, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:08:18,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=9000.0, ans=0.125 2023-11-18 02:08:31,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=9066.666666666666, ans=0.028888888888888895 2023-11-18 02:08:34,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=9133.333333333334, ans=0.5803333333333334 2023-11-18 02:08:38,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9133.333333333334, ans=0.20866666666666667 2023-11-18 02:08:40,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=9133.333333333334, ans=0.5803333333333334 2023-11-18 02:08:52,813 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:08:57,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=9266.666666666666, ans=0.33899999999999997 2023-11-18 02:09:03,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.77 vs. limit=14.45 2023-11-18 02:09:05,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=9266.666666666666, ans=0.008855072463768116 2023-11-18 02:09:09,607 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1400, loss[loss=0.2118, simple_loss=0.1721, pruned_loss=0.1135, audio_tagging_loss=0.01711, over 13660.00 frames. ], tot_loss[loss=0.2624, simple_loss=0.2127, pruned_loss=0.1571, audio_tagging_loss=0.01569, over 3034238.26 frames. ], batch size: 54, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:09:10,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.64 vs. limit=9.666666666666668 2023-11-18 02:09:12,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 1.322e+02 1.809e+02 2.689e+02 4.159e+02, threshold=3.617e+02, percent-clipped=14.0 2023-11-18 02:09:28,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=9400.0, ans=0.008826086956521739 2023-11-18 02:09:28,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=11.025 2023-11-18 02:09:52,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.97 vs. limit=9.766666666666667 2023-11-18 02:09:53,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9533.333333333334, ans=0.20466666666666666 2023-11-18 02:10:05,820 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1450, loss[loss=0.2803, simple_loss=0.247, pruned_loss=0.1463, audio_tagging_loss=0.01276, over 15434.00 frames. ], tot_loss[loss=0.2581, simple_loss=0.2108, pruned_loss=0.151, audio_tagging_loss=0.01558, over 3035439.45 frames. ], batch size: 56, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:10:12,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=9666.666666666666, ans=0.026388888888888892 2023-11-18 02:10:15,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=9733.333333333334, ans=0.125 2023-11-18 02:10:18,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=9733.333333333334, ans=0.125 2023-11-18 02:10:19,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=14.8 2023-11-18 02:10:29,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=9800.0, ans=0.125 2023-11-18 02:10:29,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=9800.0, ans=0.0 2023-11-18 02:10:31,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.05 vs. limit=11.175 2023-11-18 02:10:45,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=9866.666666666666, ans=0.02555555555555556 2023-11-18 02:10:58,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.54 vs. limit=7.483333333333333 2023-11-18 02:10:59,002 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=11.225 2023-11-18 02:11:01,694 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1500, loss[loss=0.2481, simple_loss=0.2154, pruned_loss=0.1311, audio_tagging_loss=0.01185, over 14775.00 frames. ], tot_loss[loss=0.2532, simple_loss=0.2083, pruned_loss=0.1452, audio_tagging_loss=0.01547, over 3037502.17 frames. ], batch size: 54, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:11:04,894 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.893e+01 1.138e+02 1.532e+02 2.102e+02 5.614e+02, threshold=3.064e+02, percent-clipped=6.0 2023-11-18 02:11:14,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=10066.666666666666, ans=0.125 2023-11-18 02:11:25,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10133.333333333334, ans=0.19866666666666666 2023-11-18 02:11:27,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=10133.333333333334, ans=0.125 2023-11-18 02:11:32,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=10133.333333333334, ans=0.5453333333333334 2023-11-18 02:11:39,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=11.325 2023-11-18 02:11:42,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.19 vs. limit=7.55 2023-11-18 02:11:44,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=10200.0, ans=0.125 2023-11-18 02:11:50,744 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.63 vs. limit=7.566666666666666 2023-11-18 02:11:59,191 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1550, loss[loss=0.1981, simple_loss=0.1692, pruned_loss=0.09932, audio_tagging_loss=0.0153, over 14707.00 frames. ], tot_loss[loss=0.2453, simple_loss=0.2025, pruned_loss=0.1379, audio_tagging_loss=0.01566, over 3042595.08 frames. ], batch size: 55, lr: 4.45e-02, grad_scale: 16.0 2023-11-18 02:12:15,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.13 vs. limit=15.3 2023-11-18 02:12:24,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=4.57 2023-11-18 02:12:31,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10533.333333333334, ans=0.19466666666666665 2023-11-18 02:12:34,792 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:12:41,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=11.45 2023-11-18 02:12:56,142 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1600, loss[loss=0.2107, simple_loss=0.1749, pruned_loss=0.107, audio_tagging_loss=0.01805, over 15840.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.2006, pruned_loss=0.1333, audio_tagging_loss=0.01545, over 3043890.60 frames. ], batch size: 61, lr: 4.45e-02, grad_scale: 32.0 2023-11-18 02:12:58,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=10666.666666666666, ans=0.19333333333333336 2023-11-18 02:12:59,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 1.048e+02 1.443e+02 2.212e+02 4.225e+02, threshold=2.886e+02, percent-clipped=6.0 2023-11-18 02:13:18,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=8.32 2023-11-18 02:13:20,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.32 vs. limit=15.6 2023-11-18 02:13:20,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=11.55 2023-11-18 02:13:36,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=10866.666666666666, ans=0.125 2023-11-18 02:13:41,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=10933.333333333334, ans=0.125 2023-11-18 02:13:51,852 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1650, loss[loss=0.218, simple_loss=0.1785, pruned_loss=0.1183, audio_tagging_loss=0.01339, over 14558.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.1959, pruned_loss=0.1274, audio_tagging_loss=0.01552, over 3043095.62 frames. ], batch size: 56, lr: 4.45e-02, grad_scale: 16.0 2023-11-18 02:14:01,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11000.0, ans=0.19 2023-11-18 02:14:16,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=11133.333333333334, ans=0.020277777777777773 2023-11-18 02:14:47,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=11333.333333333334, ans=0.5033333333333334 2023-11-18 02:14:48,709 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1700, loss[loss=0.2405, simple_loss=0.2076, pruned_loss=0.1267, audio_tagging_loss=0.01149, over 14851.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.1947, pruned_loss=0.1239, audio_tagging_loss=0.01538, over 3050451.03 frames. ], batch size: 56, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:14:53,000 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.938e+01 1.231e+02 1.950e+02 2.730e+02 7.528e+02, threshold=3.901e+02, percent-clipped=22.0 2023-11-18 02:15:03,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=11400.0, ans=0.125 2023-11-18 02:15:12,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=11466.666666666666, ans=0.4986666666666667 2023-11-18 02:15:35,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=11.85 2023-11-18 02:15:44,874 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1750, loss[loss=0.1569, simple_loss=0.1402, pruned_loss=0.07406, audio_tagging_loss=0.01236, over 14939.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.1911, pruned_loss=0.1198, audio_tagging_loss=0.01517, over 3051058.57 frames. ], batch size: 60, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:15:57,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=11.9 2023-11-18 02:16:02,228 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:16:25,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11866.666666666666, ans=0.18133333333333335 2023-11-18 02:16:32,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11933.333333333334, ans=0.18066666666666667 2023-11-18 02:16:35,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=11933.333333333334, ans=10.0 2023-11-18 02:16:41,139 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1800, loss[loss=0.1848, simple_loss=0.1525, pruned_loss=0.09263, audio_tagging_loss=0.01662, over 15540.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.1899, pruned_loss=0.1168, audio_tagging_loss=0.01477, over 3048559.42 frames. ], batch size: 61, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:16:45,460 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 1.122e+02 1.379e+02 2.095e+02 9.381e+02, threshold=2.759e+02, percent-clipped=5.0 2023-11-18 02:16:57,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=12066.666666666666, ans=0.01638888888888889 2023-11-18 02:17:02,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=12133.333333333334, ans=0.125 2023-11-18 02:17:24,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=12.075 2023-11-18 02:17:37,656 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1850, loss[loss=0.1357, simple_loss=0.11, pruned_loss=0.062, audio_tagging_loss=0.0188, over 15012.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.1861, pruned_loss=0.1125, audio_tagging_loss=0.01475, over 3056171.10 frames. ], batch size: 58, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:17:46,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.37 vs. limit=11.166666666666668 2023-11-18 02:18:06,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=16.85 2023-11-18 02:18:18,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=12533.333333333334, ans=10.0 2023-11-18 02:18:19,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=12533.333333333334, ans=0.125 2023-11-18 02:18:21,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.76 vs. limit=4.89 2023-11-18 02:18:26,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=12600.0, ans=0.125 2023-11-18 02:18:32,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=12666.666666666666, ans=0.125 2023-11-18 02:18:33,429 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1900, loss[loss=0.2467, simple_loss=0.2177, pruned_loss=0.1265, audio_tagging_loss=0.01162, over 15739.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.1852, pruned_loss=0.1103, audio_tagging_loss=0.01457, over 3056644.25 frames. ], batch size: 57, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:18:34,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=12666.666666666666, ans=0.125 2023-11-18 02:18:37,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 1.124e+02 1.503e+02 2.193e+02 6.798e+02, threshold=3.006e+02, percent-clipped=14.0 2023-11-18 02:18:45,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.33 vs. limit=17.05 2023-11-18 02:18:51,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.22 vs. limit=8.183333333333334 2023-11-18 02:19:29,592 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 1950, loss[loss=0.1393, simple_loss=0.1188, pruned_loss=0.06607, audio_tagging_loss=0.01386, over 15174.00 frames. ], tot_loss[loss=0.212, simple_loss=0.1835, pruned_loss=0.1073, audio_tagging_loss=0.01446, over 3051600.69 frames. ], batch size: 58, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:19:39,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=12.375 2023-11-18 02:19:40,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=13066.666666666666, ans=0.4426666666666667 2023-11-18 02:19:47,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=13066.666666666666, ans=0.125 2023-11-18 02:19:56,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=13133.333333333334, ans=0.07 2023-11-18 02:20:03,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=13200.0, ans=0.011666666666666672 2023-11-18 02:20:19,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=13266.666666666666, ans=0.125 2023-11-18 02:20:26,160 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2000, loss[loss=0.1604, simple_loss=0.1375, pruned_loss=0.07703, audio_tagging_loss=0.01464, over 15255.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.1834, pruned_loss=0.1057, audio_tagging_loss=0.01445, over 3052922.13 frames. ], batch size: 61, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:20:28,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=13333.333333333334, ans=0.125 2023-11-18 02:20:30,374 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 1.116e+02 1.535e+02 2.034e+02 3.808e+02, threshold=3.071e+02, percent-clipped=5.0 2023-11-18 02:20:42,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=13400.0, ans=0.125 2023-11-18 02:20:45,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=13400.0, ans=0.125 2023-11-18 02:21:06,617 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:21:15,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=13600.0, ans=0.010000000000000002 2023-11-18 02:21:20,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.16 vs. limit=17.75 2023-11-18 02:21:21,362 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2050, loss[loss=0.2334, simple_loss=0.2111, pruned_loss=0.1154, audio_tagging_loss=0.01235, over 15819.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.1837, pruned_loss=0.1045, audio_tagging_loss=0.01436, over 3054816.79 frames. ], batch size: 59, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:21:50,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=13800.0, ans=0.125 2023-11-18 02:21:51,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=13800.0, ans=0.125 2023-11-18 02:21:55,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=13866.666666666666, ans=0.41466666666666674 2023-11-18 02:21:56,685 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.88 vs. limit=8.466666666666667 2023-11-18 02:21:57,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.96 vs. limit=17.9 2023-11-18 02:22:06,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=13933.333333333334, ans=0.125 2023-11-18 02:22:11,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=12.725 2023-11-18 02:22:17,374 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2100, loss[loss=0.1628, simple_loss=0.144, pruned_loss=0.07701, audio_tagging_loss=0.01377, over 15589.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.1806, pruned_loss=0.1012, audio_tagging_loss=0.01439, over 3055130.91 frames. ], batch size: 58, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:22:21,605 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.391e+01 1.118e+02 1.317e+02 1.653e+02 4.106e+02, threshold=2.634e+02, percent-clipped=4.0 2023-11-18 02:22:36,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.16 vs. limit=12.775 2023-11-18 02:22:42,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=14133.333333333334, ans=0.125 2023-11-18 02:22:53,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.50 vs. limit=18.15 2023-11-18 02:23:04,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=14266.666666666666, ans=0.15733333333333333 2023-11-18 02:23:10,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14266.666666666666, ans=0.15733333333333333 2023-11-18 02:23:11,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=18.2 2023-11-18 02:23:13,716 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2150, loss[loss=0.1959, simple_loss=0.1687, pruned_loss=0.09546, audio_tagging_loss=0.01603, over 15560.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.1792, pruned_loss=0.09908, audio_tagging_loss=0.01436, over 3052879.31 frames. ], batch size: 59, lr: 4.41e-02, grad_scale: 32.0 2023-11-18 02:23:25,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=14400.0, ans=0.125 2023-11-18 02:23:31,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.39 vs. limit=8.6 2023-11-18 02:23:39,652 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:23:42,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=14466.666666666666, ans=0.3936666666666667 2023-11-18 02:23:47,737 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:24:10,328 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2200, loss[loss=0.191, simple_loss=0.1829, pruned_loss=0.09063, audio_tagging_loss=0.008928, over 14307.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.1771, pruned_loss=0.09677, audio_tagging_loss=0.01442, over 3043440.74 frames. ], batch size: 52, lr: 4.41e-02, grad_scale: 32.0 2023-11-18 02:24:12,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=14666.666666666666, ans=0.3866666666666667 2023-11-18 02:24:13,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=14666.666666666666, ans=0.125 2023-11-18 02:24:14,663 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.017e+01 1.117e+02 1.377e+02 2.009e+02 5.109e+02, threshold=2.755e+02, percent-clipped=7.0 2023-11-18 02:24:18,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14666.666666666666, ans=0.125 2023-11-18 02:25:07,602 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2250, loss[loss=0.1931, simple_loss=0.1672, pruned_loss=0.09438, audio_tagging_loss=0.01513, over 16178.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.1764, pruned_loss=0.09552, audio_tagging_loss=0.01445, over 3040804.28 frames. ], batch size: 62, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:25:35,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.45 vs. limit=13.175 2023-11-18 02:25:41,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=13.2 2023-11-18 02:25:54,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15266.666666666666, ans=0.14733333333333334 2023-11-18 02:25:56,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=15266.666666666666, ans=0.125 2023-11-18 02:26:05,405 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2300, loss[loss=0.2226, simple_loss=0.2028, pruned_loss=0.1085, audio_tagging_loss=0.01272, over 15080.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.1769, pruned_loss=0.09493, audio_tagging_loss=0.01444, over 3041764.76 frames. ], batch size: 55, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:26:09,709 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 1.107e+02 1.429e+02 1.999e+02 3.636e+02, threshold=2.858e+02, percent-clipped=5.0 2023-11-18 02:26:10,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.97 vs. limit=13.25 2023-11-18 02:26:13,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=19.0 2023-11-18 02:26:29,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=13.3 2023-11-18 02:26:35,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=15466.666666666666, ans=0.125 2023-11-18 02:26:42,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=15533.333333333334, ans=0.125 2023-11-18 02:26:51,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.33 vs. limit=5.0 2023-11-18 02:26:56,042 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:26:56,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=15600.0, ans=0.14400000000000002 2023-11-18 02:27:01,480 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2350, loss[loss=0.1911, simple_loss=0.1733, pruned_loss=0.09188, audio_tagging_loss=0.01252, over 16136.00 frames. ], tot_loss[loss=0.195, simple_loss=0.1754, pruned_loss=0.09301, audio_tagging_loss=0.0145, over 3044707.94 frames. ], batch size: 62, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:27:13,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.76 vs. limit=8.933333333333334 2023-11-18 02:27:36,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=15866.666666666666, ans=10.0 2023-11-18 02:27:37,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=13.45 2023-11-18 02:27:44,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=15866.666666666666, ans=0.14133333333333334 2023-11-18 02:27:57,899 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2400, loss[loss=0.2537, simple_loss=0.2368, pruned_loss=0.1233, audio_tagging_loss=0.01205, over 15778.00 frames. ], tot_loss[loss=0.1926, simple_loss=0.1737, pruned_loss=0.09124, audio_tagging_loss=0.01461, over 3044836.95 frames. ], batch size: 60, lr: 4.39e-02, grad_scale: 32.0 2023-11-18 02:28:02,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.230e+01 1.240e+02 1.395e+02 1.790e+02 3.155e+02, threshold=2.791e+02, percent-clipped=5.0 2023-11-18 02:28:28,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=16133.333333333334, ans=0.125 2023-11-18 02:28:34,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=16200.0, ans=0.125 2023-11-18 02:28:41,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=16266.666666666666, ans=0.125 2023-11-18 02:28:43,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=13.6 2023-11-18 02:28:54,736 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2450, loss[loss=0.2067, simple_loss=0.196, pruned_loss=0.09663, audio_tagging_loss=0.01205, over 14967.00 frames. ], tot_loss[loss=0.1917, simple_loss=0.1737, pruned_loss=0.09025, audio_tagging_loss=0.01471, over 3048121.12 frames. ], batch size: 58, lr: 4.39e-02, grad_scale: 32.0 2023-11-18 02:29:04,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.85 vs. limit=13.166666666666668 2023-11-18 02:29:08,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=16400.0, ans=0.04830000000000001 2023-11-18 02:29:13,727 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=13.65 2023-11-18 02:29:23,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=16466.666666666668, ans=0.0 2023-11-18 02:29:27,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16533.333333333332, ans=0.13466666666666668 2023-11-18 02:29:28,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16533.333333333332, ans=0.13466666666666668 2023-11-18 02:29:31,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=16533.333333333332, ans=0.0 2023-11-18 02:29:49,867 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2500, loss[loss=0.1741, simple_loss=0.1627, pruned_loss=0.07836, audio_tagging_loss=0.01437, over 14500.00 frames. ], tot_loss[loss=0.1891, simple_loss=0.1717, pruned_loss=0.08856, audio_tagging_loss=0.01482, over 3042761.19 frames. ], batch size: 56, lr: 4.38e-02, grad_scale: 32.0 2023-11-18 02:29:53,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=16666.666666666668, ans=0.1333333333333333 2023-11-18 02:29:54,092 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.242e+01 1.096e+02 1.316e+02 1.723e+02 3.236e+02, threshold=2.632e+02, percent-clipped=4.0 2023-11-18 02:30:15,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=16800.0, ans=0.0072173913043478265 2023-11-18 02:30:19,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=16800.0, ans=0.0 2023-11-18 02:30:43,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=16933.333333333332, ans=0.0 2023-11-18 02:30:45,350 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2550, loss[loss=0.1836, simple_loss=0.1721, pruned_loss=0.08377, audio_tagging_loss=0.0138, over 15157.00 frames. ], tot_loss[loss=0.1885, simple_loss=0.1717, pruned_loss=0.08814, audio_tagging_loss=0.01459, over 3044909.61 frames. ], batch size: 57, lr: 4.38e-02, grad_scale: 32.0 2023-11-18 02:30:45,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=17000.0, ans=0.0 2023-11-18 02:30:59,857 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:31:00,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.15 vs. limit=13.9 2023-11-18 02:31:03,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=17066.666666666668, ans=0.125 2023-11-18 02:31:09,418 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=13.924999999999999 2023-11-18 02:31:40,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.91 vs. limit=20.450000000000003 2023-11-18 02:31:43,167 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2600, loss[loss=0.1437, simple_loss=0.1196, pruned_loss=0.06492, audio_tagging_loss=0.01893, over 14498.00 frames. ], tot_loss[loss=0.186, simple_loss=0.1698, pruned_loss=0.0866, audio_tagging_loss=0.01451, over 3040887.67 frames. ], batch size: 56, lr: 4.37e-02, grad_scale: 32.0 2023-11-18 02:31:47,404 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 1.250e+02 1.620e+02 2.059e+02 4.953e+02, threshold=3.240e+02, percent-clipped=12.0 2023-11-18 02:31:50,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=17333.333333333332, ans=0.125 2023-11-18 02:31:58,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=17400.0, ans=0.125 2023-11-18 02:32:08,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=17466.666666666668, ans=0.125 2023-11-18 02:32:09,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=17466.666666666668, ans=0.125 2023-11-18 02:32:14,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=17466.666666666668, ans=0.0 2023-11-18 02:32:15,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=17533.333333333332, ans=0.07466666666666669 2023-11-18 02:32:24,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.49 vs. limit=7.506666666666666 2023-11-18 02:32:34,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.08 vs. limit=13.8 2023-11-18 02:32:38,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=17666.666666666668, ans=0.0 2023-11-18 02:32:39,191 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2650, loss[loss=0.1539, simple_loss=0.1458, pruned_loss=0.06465, audio_tagging_loss=0.01641, over 15580.00 frames. ], tot_loss[loss=0.1856, simple_loss=0.1701, pruned_loss=0.08622, audio_tagging_loss=0.01439, over 3043945.75 frames. ], batch size: 60, lr: 4.37e-02, grad_scale: 32.0 2023-11-18 02:32:52,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=17733.333333333332, ans=0.0 2023-11-18 02:32:53,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=14.15 2023-11-18 02:32:54,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=17733.333333333332, ans=0.0 2023-11-18 02:33:30,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=17933.333333333332, ans=10.0 2023-11-18 02:33:31,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=17933.333333333332, ans=0.125 2023-11-18 02:33:34,665 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2700, loss[loss=0.1621, simple_loss=0.1508, pruned_loss=0.07319, audio_tagging_loss=0.01355, over 14340.00 frames. ], tot_loss[loss=0.1821, simple_loss=0.168, pruned_loss=0.08389, audio_tagging_loss=0.01421, over 3040268.14 frames. ], batch size: 54, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:33:34,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18000.0, ans=0.12000000000000002 2023-11-18 02:33:38,903 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.040e+01 1.101e+02 1.289e+02 1.771e+02 2.746e+02, threshold=2.578e+02, percent-clipped=0.0 2023-11-18 02:33:40,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=11.2 2023-11-18 02:33:41,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=18000.0, ans=0.9299999999999999 2023-11-18 02:33:48,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.68 vs. limit=14.033333333333335 2023-11-18 02:34:31,185 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2750, loss[loss=0.1263, simple_loss=0.1124, pruned_loss=0.05183, audio_tagging_loss=0.01826, over 15229.00 frames. ], tot_loss[loss=0.1801, simple_loss=0.1666, pruned_loss=0.0826, audio_tagging_loss=0.01421, over 3038680.78 frames. ], batch size: 58, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:34:51,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=18400.0, ans=0.0 2023-11-18 02:34:57,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=18466.666666666668, ans=0.0 2023-11-18 02:34:58,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=18466.666666666668, ans=0.2536666666666667 2023-11-18 02:35:01,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.26 vs. limit=9.616666666666667 2023-11-18 02:35:03,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=14.45 2023-11-18 02:35:04,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=14.45 2023-11-18 02:35:16,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.64 vs. limit=9.65 2023-11-18 02:35:20,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.14 vs. limit=14.475 2023-11-18 02:35:20,310 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:35:27,839 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2800, loss[loss=0.1122, simple_loss=0.0996, pruned_loss=0.04464, audio_tagging_loss=0.01779, over 15621.00 frames. ], tot_loss[loss=0.1795, simple_loss=0.1664, pruned_loss=0.0821, audio_tagging_loss=0.01421, over 3039690.30 frames. ], batch size: 60, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:35:32,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.686e+01 1.129e+02 1.327e+02 1.684e+02 3.032e+02, threshold=2.655e+02, percent-clipped=2.0 2023-11-18 02:35:33,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=18666.666666666668, ans=0.125 2023-11-18 02:35:38,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=18733.333333333332, ans=0.125 2023-11-18 02:35:38,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=18733.333333333332, ans=0.2443333333333334 2023-11-18 02:35:57,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=18800.0, ans=0.0 2023-11-18 02:36:05,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18866.666666666668, ans=0.11133333333333331 2023-11-18 02:36:11,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=18933.333333333332, ans=0.006753623188405798 2023-11-18 02:36:22,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=19000.0, ans=0.006739130434782609 2023-11-18 02:36:23,507 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2850, loss[loss=0.126, simple_loss=0.1121, pruned_loss=0.05504, audio_tagging_loss=0.01498, over 14645.00 frames. ], tot_loss[loss=0.1773, simple_loss=0.1648, pruned_loss=0.08071, audio_tagging_loss=0.01415, over 3048558.10 frames. ], batch size: 59, lr: 4.35e-02, grad_scale: 32.0 2023-11-18 02:36:42,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=14.65 2023-11-18 02:36:47,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=19133.333333333332, ans=0.2303333333333334 2023-11-18 02:36:50,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=19133.333333333332, ans=0.0 2023-11-18 02:36:50,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=19133.333333333332, ans=0.2303333333333334 2023-11-18 02:36:52,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19133.333333333332, ans=0.10866666666666669 2023-11-18 02:37:00,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=19200.0, ans=0.0 2023-11-18 02:37:14,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=14.725 2023-11-18 02:37:21,224 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2900, loss[loss=0.147, simple_loss=0.1401, pruned_loss=0.06488, audio_tagging_loss=0.01208, over 15646.00 frames. ], tot_loss[loss=0.1757, simple_loss=0.1636, pruned_loss=0.07978, audio_tagging_loss=0.01411, over 3049776.29 frames. ], batch size: 58, lr: 4.35e-02, grad_scale: 32.0 2023-11-18 02:37:21,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=19333.333333333332, ans=0.125 2023-11-18 02:37:25,457 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 1.019e+02 1.241e+02 1.587e+02 2.643e+02, threshold=2.482e+02, percent-clipped=0.0 2023-11-18 02:37:27,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=19333.333333333332, ans=0.0 2023-11-18 02:37:32,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=14.775 2023-11-18 02:38:07,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=19600.0, ans=0.0 2023-11-18 02:38:17,189 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 2950, loss[loss=0.1671, simple_loss=0.1638, pruned_loss=0.06956, audio_tagging_loss=0.01564, over 15456.00 frames. ], tot_loss[loss=0.1772, simple_loss=0.1654, pruned_loss=0.08037, audio_tagging_loss=0.01408, over 3050952.10 frames. ], batch size: 57, lr: 4.34e-02, grad_scale: 32.0 2023-11-18 02:38:36,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=19733.333333333332, ans=0.0 2023-11-18 02:39:06,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=19933.333333333332, ans=0.035 2023-11-18 02:39:07,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.76 vs. limit=14.975 2023-11-18 02:39:13,990 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3000, loss[loss=0.1779, simple_loss=0.1585, pruned_loss=0.08174, audio_tagging_loss=0.01696, over 14149.00 frames. ], tot_loss[loss=0.1767, simple_loss=0.1656, pruned_loss=0.07985, audio_tagging_loss=0.01405, over 3045077.41 frames. ], batch size: 55, lr: 4.34e-02, grad_scale: 32.0 2023-11-18 02:39:13,991 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 02:39:38,063 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.3830, 4.1413, 4.4911, 4.4064], device='cuda:3') 2023-11-18 02:39:47,911 INFO [train_asr.py:1147] (3/4) Epoch 1, validation: loss=0.1123, simple_loss=0.08353, pruned_loss=0.02777, audio_tagging_loss=0.04274, over 4681554.00 frames. 2023-11-18 02:39:47,912 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 02:39:48,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=20000.0, ans=0.125 2023-11-18 02:39:52,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 1.112e+02 1.246e+02 1.564e+02 3.954e+02, threshold=2.493e+02, percent-clipped=6.0 2023-11-18 02:39:58,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=20066.666666666668, ans=0.125 2023-11-18 02:40:01,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=20066.666666666668, ans=0.07 2023-11-18 02:40:03,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=20066.666666666668, ans=0.2 2023-11-18 02:40:06,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=12.0 2023-11-18 02:40:09,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=20133.333333333332, ans=0.125 2023-11-18 02:40:17,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2023-11-18 02:40:33,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=20266.666666666668, ans=0.5 2023-11-18 02:40:41,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=20266.666666666668, ans=10.0 2023-11-18 02:40:43,608 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3050, loss[loss=0.1625, simple_loss=0.1471, pruned_loss=0.07328, audio_tagging_loss=0.01564, over 14262.00 frames. ], tot_loss[loss=0.1764, simple_loss=0.1654, pruned_loss=0.07961, audio_tagging_loss=0.01407, over 3045440.56 frames. ], batch size: 54, lr: 4.33e-02, grad_scale: 32.0 2023-11-18 02:40:44,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=20333.333333333332, ans=0.1 2023-11-18 02:40:46,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-11-18 02:40:53,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2023-11-18 02:41:13,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=20466.666666666668, ans=0.125 2023-11-18 02:41:14,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=20466.666666666668, ans=0.0 2023-11-18 02:41:15,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=20466.666666666668, ans=0.04949747468305833 2023-11-18 02:41:18,271 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:41:19,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=20533.333333333332, ans=0.125 2023-11-18 02:41:31,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20600.0, ans=0.1 2023-11-18 02:41:33,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20600.0, ans=0.1 2023-11-18 02:41:40,411 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3100, loss[loss=0.1661, simple_loss=0.1499, pruned_loss=0.07029, audio_tagging_loss=0.02083, over 15345.00 frames. ], tot_loss[loss=0.1756, simple_loss=0.165, pruned_loss=0.07887, audio_tagging_loss=0.0142, over 3046599.56 frames. ], batch size: 56, lr: 4.33e-02, grad_scale: 32.0 2023-11-18 02:41:44,733 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 1.051e+02 1.308e+02 1.673e+02 2.696e+02, threshold=2.616e+02, percent-clipped=3.0 2023-11-18 02:42:00,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20733.333333333332, ans=0.1 2023-11-18 02:42:03,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=20800.0, ans=0.125 2023-11-18 02:42:13,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=20866.666666666668, ans=0.04949747468305833 2023-11-18 02:42:17,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=20866.666666666668, ans=0.125 2023-11-18 02:42:19,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=20866.666666666668, ans=0.1 2023-11-18 02:42:36,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=21000.0, ans=0.0 2023-11-18 02:42:37,782 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3150, loss[loss=0.1694, simple_loss=0.1478, pruned_loss=0.07725, audio_tagging_loss=0.0183, over 15811.00 frames. ], tot_loss[loss=0.1755, simple_loss=0.1655, pruned_loss=0.07861, audio_tagging_loss=0.0142, over 3045034.95 frames. ], batch size: 59, lr: 4.32e-02, grad_scale: 32.0 2023-11-18 02:42:40,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=21000.0, ans=0.006304347826086957 2023-11-18 02:43:09,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=21200.0, ans=0.006260869565217392 2023-11-18 02:43:11,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=21200.0, ans=0.125 2023-11-18 02:43:24,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=21266.666666666668, ans=0.0 2023-11-18 02:43:26,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=21266.666666666668, ans=0.1 2023-11-18 02:43:34,054 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3200, loss[loss=0.1544, simple_loss=0.1441, pruned_loss=0.06606, audio_tagging_loss=0.01634, over 14640.00 frames. ], tot_loss[loss=0.1752, simple_loss=0.1654, pruned_loss=0.07816, audio_tagging_loss=0.01429, over 3045263.35 frames. ], batch size: 55, lr: 4.32e-02, grad_scale: 32.0 2023-11-18 02:43:38,319 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 1.064e+02 1.244e+02 1.490e+02 2.410e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 02:43:48,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=21400.0, ans=0.125 2023-11-18 02:43:49,956 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:43:56,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=21466.666666666668, ans=0.125 2023-11-18 02:44:09,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-18 02:44:14,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=21533.333333333332, ans=0.125 2023-11-18 02:44:19,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=21600.0, ans=0.125 2023-11-18 02:44:28,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.11 vs. limit=22.5 2023-11-18 02:44:30,114 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3250, loss[loss=0.1426, simple_loss=0.1329, pruned_loss=0.06254, audio_tagging_loss=0.01362, over 16902.00 frames. ], tot_loss[loss=0.1734, simple_loss=0.164, pruned_loss=0.07708, audio_tagging_loss=0.01427, over 3045603.97 frames. ], batch size: 63, lr: 4.31e-02, grad_scale: 32.0 2023-11-18 02:44:37,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=21666.666666666668, ans=0.125 2023-11-18 02:44:57,671 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.64 vs. limit=22.5 2023-11-18 02:45:00,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=21800.0, ans=0.1 2023-11-18 02:45:15,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21933.333333333332, ans=0.1 2023-11-18 02:45:27,668 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3300, loss[loss=0.1764, simple_loss=0.1746, pruned_loss=0.07539, audio_tagging_loss=0.01371, over 15705.00 frames. ], tot_loss[loss=0.172, simple_loss=0.1625, pruned_loss=0.0765, audio_tagging_loss=0.01427, over 3040767.71 frames. ], batch size: 57, lr: 4.31e-02, grad_scale: 32.0 2023-11-18 02:45:27,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=22000.0, ans=0.0 2023-11-18 02:45:32,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.259e+01 1.069e+02 1.225e+02 1.477e+02 2.736e+02, threshold=2.451e+02, percent-clipped=1.0 2023-11-18 02:45:50,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2023-11-18 02:46:19,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=22266.666666666668, ans=0.1 2023-11-18 02:46:21,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=22266.666666666668, ans=0.125 2023-11-18 02:46:23,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2023-11-18 02:46:24,722 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3350, loss[loss=0.1832, simple_loss=0.1828, pruned_loss=0.07919, audio_tagging_loss=0.01257, over 16180.00 frames. ], tot_loss[loss=0.1725, simple_loss=0.1634, pruned_loss=0.07672, audio_tagging_loss=0.01405, over 3037920.10 frames. ], batch size: 58, lr: 4.30e-02, grad_scale: 32.0 2023-11-18 02:46:34,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=22400.0, ans=0.125 2023-11-18 02:46:47,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2023-11-18 02:47:02,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=22533.333333333332, ans=0.005971014492753624 2023-11-18 02:47:17,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=22600.0, ans=0.1 2023-11-18 02:47:21,311 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3400, loss[loss=0.2, simple_loss=0.1998, pruned_loss=0.09063, audio_tagging_loss=0.009429, over 15812.00 frames. ], tot_loss[loss=0.1729, simple_loss=0.1645, pruned_loss=0.07692, audio_tagging_loss=0.0137, over 3043451.48 frames. ], batch size: 56, lr: 4.29e-02, grad_scale: 32.0 2023-11-18 02:47:21,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=22666.666666666668, ans=0.125 2023-11-18 02:47:25,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 1.014e+02 1.234e+02 1.515e+02 3.091e+02, threshold=2.469e+02, percent-clipped=0.0 2023-11-18 02:47:37,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=22733.333333333332, ans=0.125 2023-11-18 02:47:38,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=22733.333333333332, ans=0.125 2023-11-18 02:47:52,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2023-11-18 02:48:11,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=22933.333333333332, ans=22.5 2023-11-18 02:48:13,505 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:48:14,888 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.03 vs. limit=15.0 2023-11-18 02:48:18,060 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3450, loss[loss=0.1532, simple_loss=0.1451, pruned_loss=0.06513, audio_tagging_loss=0.01554, over 15352.00 frames. ], tot_loss[loss=0.1723, simple_loss=0.1644, pruned_loss=0.07636, audio_tagging_loss=0.01372, over 3043505.30 frames. ], batch size: 58, lr: 4.29e-02, grad_scale: 32.0 2023-11-18 02:48:24,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=23000.0, ans=0.2 2023-11-18 02:48:29,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=23066.666666666668, ans=0.125 2023-11-18 02:48:42,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=23133.333333333332, ans=0.0 2023-11-18 02:48:47,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=23133.333333333332, ans=0.0 2023-11-18 02:48:53,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=23200.0, ans=0.1 2023-11-18 02:49:01,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=23200.0, ans=0.125 2023-11-18 02:49:04,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=23266.666666666668, ans=0.125 2023-11-18 02:49:15,064 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3500, loss[loss=0.1933, simple_loss=0.1923, pruned_loss=0.0863, audio_tagging_loss=0.0109, over 14649.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.1637, pruned_loss=0.07584, audio_tagging_loss=0.01363, over 3038373.26 frames. ], batch size: 56, lr: 4.28e-02, grad_scale: 32.0 2023-11-18 02:49:15,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.76 vs. limit=15.0 2023-11-18 02:49:19,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.153e+01 1.129e+02 1.309e+02 1.633e+02 2.948e+02, threshold=2.617e+02, percent-clipped=2.0 2023-11-18 02:49:26,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-11-18 02:49:34,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=23400.0, ans=0.125 2023-11-18 02:49:35,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=23466.666666666668, ans=0.125 2023-11-18 02:49:44,222 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:50:00,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.30 vs. limit=15.0 2023-11-18 02:50:10,785 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3550, loss[loss=0.1971, simple_loss=0.1782, pruned_loss=0.09051, audio_tagging_loss=0.01748, over 15975.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.1609, pruned_loss=0.07456, audio_tagging_loss=0.01374, over 3029308.77 frames. ], batch size: 61, lr: 4.28e-02, grad_scale: 32.0 2023-11-18 02:50:14,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.40 vs. limit=15.0 2023-11-18 02:50:28,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=23733.333333333332, ans=0.0 2023-11-18 02:50:52,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=23866.666666666668, ans=0.025 2023-11-18 02:50:58,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=23933.333333333332, ans=0.125 2023-11-18 02:51:03,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.37 vs. limit=22.5 2023-11-18 02:51:08,217 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3600, loss[loss=0.1878, simple_loss=0.1828, pruned_loss=0.08242, audio_tagging_loss=0.014, over 14844.00 frames. ], tot_loss[loss=0.1693, simple_loss=0.162, pruned_loss=0.07472, audio_tagging_loss=0.0136, over 3036383.26 frames. ], batch size: 55, lr: 4.27e-02, grad_scale: 32.0 2023-11-18 02:51:12,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.18 vs. limit=6.0 2023-11-18 02:51:13,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 1.015e+02 1.156e+02 1.393e+02 2.534e+02, threshold=2.312e+02, percent-clipped=0.0 2023-11-18 02:51:16,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24000.0, ans=0.1 2023-11-18 02:51:29,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=24066.666666666668, ans=0.125 2023-11-18 02:51:30,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=24133.333333333332, ans=0.1 2023-11-18 02:51:45,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=24200.0, ans=0.125 2023-11-18 02:52:04,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=24333.333333333332, ans=0.0 2023-11-18 02:52:04,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24333.333333333332, ans=0.1 2023-11-18 02:52:05,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.31 vs. limit=22.5 2023-11-18 02:52:05,636 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3650, loss[loss=0.1856, simple_loss=0.1742, pruned_loss=0.08511, audio_tagging_loss=0.01343, over 15684.00 frames. ], tot_loss[loss=0.1697, simple_loss=0.1622, pruned_loss=0.07496, audio_tagging_loss=0.01366, over 3035159.37 frames. ], batch size: 57, lr: 4.27e-02, grad_scale: 64.0 2023-11-18 02:52:24,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=24400.0, ans=0.125 2023-11-18 02:52:50,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2023-11-18 02:52:55,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=24600.0, ans=0.125 2023-11-18 02:53:01,438 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3700, loss[loss=0.1259, simple_loss=0.1112, pruned_loss=0.05313, audio_tagging_loss=0.01718, over 13975.00 frames. ], tot_loss[loss=0.1681, simple_loss=0.1606, pruned_loss=0.0741, audio_tagging_loss=0.01375, over 3037498.44 frames. ], batch size: 54, lr: 4.26e-02, grad_scale: 64.0 2023-11-18 02:53:01,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=24666.666666666668, ans=0.1 2023-11-18 02:53:05,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 1.068e+02 1.322e+02 1.624e+02 2.925e+02, threshold=2.645e+02, percent-clipped=5.0 2023-11-18 02:53:10,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=24666.666666666668, ans=0.125 2023-11-18 02:53:11,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=24733.333333333332, ans=0.125 2023-11-18 02:53:13,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.06 vs. limit=22.5 2023-11-18 02:53:24,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=24800.0, ans=0.005478260869565218 2023-11-18 02:53:28,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=24800.0, ans=0.0 2023-11-18 02:53:29,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24800.0, ans=0.125 2023-11-18 02:53:40,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=24866.666666666668, ans=0.125 2023-11-18 02:53:49,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=24933.333333333332, ans=0.125 2023-11-18 02:53:58,214 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3750, loss[loss=0.202, simple_loss=0.1935, pruned_loss=0.09658, audio_tagging_loss=0.008676, over 15511.00 frames. ], tot_loss[loss=0.17, simple_loss=0.1626, pruned_loss=0.07505, audio_tagging_loss=0.01365, over 3040586.61 frames. ], batch size: 56, lr: 4.26e-02, grad_scale: 64.0 2023-11-18 02:54:00,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=25000.0, ans=0.125 2023-11-18 02:54:12,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=25066.666666666668, ans=0.005420289855072463 2023-11-18 02:54:37,591 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:54:56,658 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3800, loss[loss=0.187, simple_loss=0.1898, pruned_loss=0.07915, audio_tagging_loss=0.01295, over 14777.00 frames. ], tot_loss[loss=0.1696, simple_loss=0.1623, pruned_loss=0.07471, audio_tagging_loss=0.01373, over 3047787.56 frames. ], batch size: 57, lr: 4.25e-02, grad_scale: 64.0 2023-11-18 02:54:57,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25333.333333333332, ans=0.1 2023-11-18 02:55:01,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.087e+01 1.058e+02 1.234e+02 1.426e+02 2.558e+02, threshold=2.469e+02, percent-clipped=0.0 2023-11-18 02:55:01,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=25333.333333333332, ans=0.125 2023-11-18 02:55:06,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=25400.0, ans=0.125 2023-11-18 02:55:12,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=25400.0, ans=0.125 2023-11-18 02:55:38,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25533.333333333332, ans=0.1 2023-11-18 02:55:53,504 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3850, loss[loss=0.166, simple_loss=0.1606, pruned_loss=0.07263, audio_tagging_loss=0.01304, over 14405.00 frames. ], tot_loss[loss=0.1688, simple_loss=0.1617, pruned_loss=0.07409, audio_tagging_loss=0.01387, over 3042826.55 frames. ], batch size: 56, lr: 4.24e-02, grad_scale: 64.0 2023-11-18 02:56:04,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25733.333333333332, ans=0.1 2023-11-18 02:56:04,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=25733.333333333332, ans=0.1 2023-11-18 02:56:13,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=25733.333333333332, ans=0.025 2023-11-18 02:56:19,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.59 vs. limit=22.5 2023-11-18 02:56:27,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=25866.666666666668, ans=0.2 2023-11-18 02:56:33,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2023-11-18 02:56:35,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=25866.666666666668, ans=0.125 2023-11-18 02:56:46,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=25933.333333333332, ans=0.0 2023-11-18 02:56:49,655 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3900, loss[loss=0.1422, simple_loss=0.1422, pruned_loss=0.05497, audio_tagging_loss=0.0161, over 14328.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.1597, pruned_loss=0.07298, audio_tagging_loss=0.01402, over 3035172.51 frames. ], batch size: 55, lr: 4.24e-02, grad_scale: 64.0 2023-11-18 02:56:54,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 1.063e+02 1.269e+02 1.447e+02 2.279e+02, threshold=2.539e+02, percent-clipped=0.0 2023-11-18 02:56:56,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=26000.0, ans=22.5 2023-11-18 02:57:12,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-11-18 02:57:16,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.92 vs. limit=22.5 2023-11-18 02:57:23,372 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:57:44,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=12.0 2023-11-18 02:57:47,478 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 3950, loss[loss=0.1657, simple_loss=0.1522, pruned_loss=0.07252, audio_tagging_loss=0.01703, over 14254.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.1597, pruned_loss=0.07286, audio_tagging_loss=0.01419, over 3037540.89 frames. ], batch size: 54, lr: 4.23e-02, grad_scale: 64.0 2023-11-18 02:57:49,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=26333.333333333332, ans=0.125 2023-11-18 02:57:54,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=26333.333333333332, ans=0.07 2023-11-18 02:57:55,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=26333.333333333332, ans=0.125 2023-11-18 02:58:05,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=26400.0, ans=0.125 2023-11-18 02:58:08,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=26466.666666666668, ans=0.005115942028985507 2023-11-18 02:58:10,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=26466.666666666668, ans=0.2 2023-11-18 02:58:25,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=26533.333333333332, ans=0.125 2023-11-18 02:58:30,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=26533.333333333332, ans=0.00510144927536232 2023-11-18 02:58:31,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=26600.0, ans=0.125 2023-11-18 02:58:46,900 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4000, loss[loss=0.1731, simple_loss=0.1741, pruned_loss=0.07287, audio_tagging_loss=0.01319, over 16773.00 frames. ], tot_loss[loss=0.167, simple_loss=0.1598, pruned_loss=0.07287, audio_tagging_loss=0.01425, over 3041940.66 frames. ], batch size: 62, lr: 4.23e-02, grad_scale: 64.0 2023-11-18 02:58:48,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=26666.666666666668, ans=0.125 2023-11-18 02:58:48,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.97 vs. limit=15.0 2023-11-18 02:58:51,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.575e+01 1.092e+02 1.270e+02 1.504e+02 2.237e+02, threshold=2.540e+02, percent-clipped=0.0 2023-11-18 02:58:54,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=26666.666666666668, ans=0.0 2023-11-18 02:59:07,675 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:59:12,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=26800.0, ans=0.2 2023-11-18 02:59:14,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=12.0 2023-11-18 02:59:37,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26933.333333333332, ans=0.1 2023-11-18 02:59:43,017 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4050, loss[loss=0.1494, simple_loss=0.1342, pruned_loss=0.06771, audio_tagging_loss=0.01465, over 15156.00 frames. ], tot_loss[loss=0.167, simple_loss=0.1597, pruned_loss=0.07278, audio_tagging_loss=0.01436, over 3038436.79 frames. ], batch size: 58, lr: 4.22e-02, grad_scale: 64.0 2023-11-18 02:59:46,339 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:59:54,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=27066.666666666668, ans=0.0 2023-11-18 03:00:08,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-11-18 03:00:26,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27200.0, ans=0.1 2023-11-18 03:00:41,274 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4100, loss[loss=0.2032, simple_loss=0.2013, pruned_loss=0.0919, audio_tagging_loss=0.01066, over 16524.00 frames. ], tot_loss[loss=0.1676, simple_loss=0.1609, pruned_loss=0.07289, audio_tagging_loss=0.01425, over 3045659.87 frames. ], batch size: 61, lr: 4.22e-02, grad_scale: 64.0 2023-11-18 03:00:43,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=27333.333333333332, ans=0.125 2023-11-18 03:00:45,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 1.139e+02 1.299e+02 1.567e+02 2.247e+02, threshold=2.597e+02, percent-clipped=0.0 2023-11-18 03:01:01,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=27400.0, ans=0.125 2023-11-18 03:01:01,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=27400.0, ans=0.00491304347826087 2023-11-18 03:01:04,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=27466.666666666668, ans=0.125 2023-11-18 03:01:06,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=27466.666666666668, ans=0.125 2023-11-18 03:01:09,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=27466.666666666668, ans=0.05 2023-11-18 03:01:24,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=27533.333333333332, ans=0.125 2023-11-18 03:01:29,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=27600.0, ans=0.125 2023-11-18 03:01:30,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27600.0, ans=0.1 2023-11-18 03:01:37,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=27666.666666666668, ans=0.125 2023-11-18 03:01:38,129 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4150, loss[loss=0.1833, simple_loss=0.1848, pruned_loss=0.08036, audio_tagging_loss=0.01049, over 14559.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.1604, pruned_loss=0.07272, audio_tagging_loss=0.01402, over 3041007.83 frames. ], batch size: 56, lr: 4.21e-02, grad_scale: 64.0 2023-11-18 03:01:51,860 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.64 vs. limit=15.0 2023-11-18 03:02:16,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.25 vs. limit=15.0 2023-11-18 03:02:19,689 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:02:34,606 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4200, loss[loss=0.1489, simple_loss=0.1574, pruned_loss=0.05924, audio_tagging_loss=0.011, over 15284.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.1594, pruned_loss=0.07195, audio_tagging_loss=0.01391, over 3051956.11 frames. ], batch size: 56, lr: 4.20e-02, grad_scale: 64.0 2023-11-18 03:02:38,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.21 vs. limit=22.5 2023-11-18 03:02:38,905 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.536e+01 1.064e+02 1.276e+02 1.442e+02 2.964e+02, threshold=2.551e+02, percent-clipped=1.0 2023-11-18 03:02:40,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=28000.0, ans=0.125 2023-11-18 03:02:46,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=15.0 2023-11-18 03:02:48,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=28066.666666666668, ans=0.004768115942028985 2023-11-18 03:03:11,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=28200.0, ans=0.125 2023-11-18 03:03:14,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2023-11-18 03:03:14,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28200.0, ans=0.1 2023-11-18 03:03:15,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=28200.0, ans=0.05 2023-11-18 03:03:21,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28266.666666666668, ans=0.1 2023-11-18 03:03:23,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.26 vs. limit=10.0 2023-11-18 03:03:25,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.20 vs. limit=22.5 2023-11-18 03:03:32,456 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4250, loss[loss=0.155, simple_loss=0.1516, pruned_loss=0.06655, audio_tagging_loss=0.01269, over 15231.00 frames. ], tot_loss[loss=0.1653, simple_loss=0.1591, pruned_loss=0.07188, audio_tagging_loss=0.01381, over 3046141.09 frames. ], batch size: 57, lr: 4.20e-02, grad_scale: 64.0 2023-11-18 03:03:38,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=28333.333333333332, ans=22.5 2023-11-18 03:03:49,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=28400.0, ans=0.125 2023-11-18 03:03:53,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=28466.666666666668, ans=0.125 2023-11-18 03:03:57,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.47 vs. limit=12.0 2023-11-18 03:04:11,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=28533.333333333332, ans=0.2 2023-11-18 03:04:28,445 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4300, loss[loss=0.18, simple_loss=0.1649, pruned_loss=0.07945, audio_tagging_loss=0.01815, over 14995.00 frames. ], tot_loss[loss=0.1658, simple_loss=0.1598, pruned_loss=0.07228, audio_tagging_loss=0.01362, over 3048162.07 frames. ], batch size: 58, lr: 4.19e-02, grad_scale: 64.0 2023-11-18 03:04:32,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 1.089e+02 1.255e+02 1.443e+02 2.387e+02, threshold=2.510e+02, percent-clipped=0.0 2023-11-18 03:04:46,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=28733.333333333332, ans=0.0 2023-11-18 03:04:51,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=28800.0, ans=10.0 2023-11-18 03:04:56,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=28800.0, ans=0.2 2023-11-18 03:05:25,325 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4350, loss[loss=0.169, simple_loss=0.1645, pruned_loss=0.0729, audio_tagging_loss=0.01386, over 15257.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.1582, pruned_loss=0.07156, audio_tagging_loss=0.01377, over 3045627.41 frames. ], batch size: 56, lr: 4.19e-02, grad_scale: 64.0 2023-11-18 03:05:47,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=29133.333333333332, ans=0.2 2023-11-18 03:05:48,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=29133.333333333332, ans=0.0 2023-11-18 03:05:53,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=29133.333333333332, ans=0.04949747468305833 2023-11-18 03:05:59,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=29200.0, ans=0.09899494936611666 2023-11-18 03:06:04,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29200.0, ans=0.1 2023-11-18 03:06:04,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=29200.0, ans=0.125 2023-11-18 03:06:12,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=29266.666666666668, ans=0.0 2023-11-18 03:06:20,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29266.666666666668, ans=0.1 2023-11-18 03:06:22,941 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4400, loss[loss=0.1663, simple_loss=0.162, pruned_loss=0.07048, audio_tagging_loss=0.01489, over 16194.00 frames. ], tot_loss[loss=0.1645, simple_loss=0.1586, pruned_loss=0.07152, audio_tagging_loss=0.0137, over 3042233.06 frames. ], batch size: 59, lr: 4.18e-02, grad_scale: 64.0 2023-11-18 03:06:27,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.425e+01 1.162e+02 1.302e+02 1.640e+02 3.175e+02, threshold=2.603e+02, percent-clipped=6.0 2023-11-18 03:06:36,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=29400.0, ans=0.125 2023-11-18 03:07:10,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=29600.0, ans=0.2 2023-11-18 03:07:19,270 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4450, loss[loss=0.1321, simple_loss=0.1364, pruned_loss=0.05359, audio_tagging_loss=0.01032, over 14160.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.1581, pruned_loss=0.07094, audio_tagging_loss=0.01365, over 3041609.32 frames. ], batch size: 54, lr: 4.17e-02, grad_scale: 64.0 2023-11-18 03:07:24,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29666.666666666668, ans=0.1 2023-11-18 03:07:27,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=29666.666666666668, ans=0.125 2023-11-18 03:07:37,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.49 vs. limit=15.0 2023-11-18 03:07:51,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=29800.0, ans=0.125 2023-11-18 03:07:56,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=29866.666666666668, ans=0.004376811594202898 2023-11-18 03:08:01,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=29866.666666666668, ans=0.125 2023-11-18 03:08:02,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=29866.666666666668, ans=0.0 2023-11-18 03:08:15,493 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4500, loss[loss=0.1345, simple_loss=0.1269, pruned_loss=0.05644, audio_tagging_loss=0.01461, over 14168.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.1581, pruned_loss=0.07105, audio_tagging_loss=0.01372, over 3045594.50 frames. ], batch size: 57, lr: 4.17e-02, grad_scale: 64.0 2023-11-18 03:08:20,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 1.091e+02 1.301e+02 1.544e+02 2.749e+02, threshold=2.602e+02, percent-clipped=1.0 2023-11-18 03:08:32,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=30066.666666666668, ans=0.004333333333333333 2023-11-18 03:08:39,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=30133.333333333332, ans=0.0 2023-11-18 03:08:44,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=30133.333333333332, ans=0.0 2023-11-18 03:08:56,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=30200.0, ans=0.125 2023-11-18 03:09:10,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30266.666666666668, ans=0.1 2023-11-18 03:09:13,088 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4550, loss[loss=0.1928, simple_loss=0.1804, pruned_loss=0.08702, audio_tagging_loss=0.01559, over 15291.00 frames. ], tot_loss[loss=0.1637, simple_loss=0.1582, pruned_loss=0.07103, audio_tagging_loss=0.01364, over 3040965.14 frames. ], batch size: 58, lr: 4.16e-02, grad_scale: 64.0 2023-11-18 03:09:38,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=30466.666666666668, ans=0.004246376811594203 2023-11-18 03:09:40,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=30466.666666666668, ans=0.004246376811594203 2023-11-18 03:09:40,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=30466.666666666668, ans=0.09899494936611666 2023-11-18 03:09:57,979 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:10:06,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=30600.0, ans=0.2 2023-11-18 03:10:07,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=30600.0, ans=0.025 2023-11-18 03:10:10,254 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4600, loss[loss=0.195, simple_loss=0.1842, pruned_loss=0.09078, audio_tagging_loss=0.0121, over 14566.00 frames. ], tot_loss[loss=0.1625, simple_loss=0.1565, pruned_loss=0.07043, audio_tagging_loss=0.01377, over 3036480.31 frames. ], batch size: 53, lr: 4.15e-02, grad_scale: 64.0 2023-11-18 03:10:14,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 1.069e+02 1.267e+02 1.546e+02 2.795e+02, threshold=2.534e+02, percent-clipped=1.0 2023-11-18 03:10:22,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=30733.333333333332, ans=0.2 2023-11-18 03:10:25,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.24 vs. limit=10.0 2023-11-18 03:10:28,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=30733.333333333332, ans=0.0 2023-11-18 03:11:04,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=30933.333333333332, ans=0.125 2023-11-18 03:11:06,017 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4650, loss[loss=0.151, simple_loss=0.1442, pruned_loss=0.06343, audio_tagging_loss=0.01546, over 14589.00 frames. ], tot_loss[loss=0.1622, simple_loss=0.1563, pruned_loss=0.07011, audio_tagging_loss=0.01394, over 3046477.57 frames. ], batch size: 56, lr: 4.15e-02, grad_scale: 64.0 2023-11-18 03:11:07,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=31000.0, ans=0.125 2023-11-18 03:11:25,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.39 vs. limit=15.0 2023-11-18 03:11:47,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.78 vs. limit=15.0 2023-11-18 03:11:48,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=31200.0, ans=0.125 2023-11-18 03:11:52,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.69 vs. limit=22.5 2023-11-18 03:11:56,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=31266.666666666668, ans=0.2 2023-11-18 03:12:02,407 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4700, loss[loss=0.1475, simple_loss=0.1448, pruned_loss=0.05826, audio_tagging_loss=0.01688, over 15658.00 frames. ], tot_loss[loss=0.1628, simple_loss=0.1569, pruned_loss=0.07018, audio_tagging_loss=0.01412, over 3044194.04 frames. ], batch size: 58, lr: 4.14e-02, grad_scale: 64.0 2023-11-18 03:12:07,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31333.333333333332, ans=0.1 2023-11-18 03:12:07,955 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.482e+01 1.109e+02 1.199e+02 1.387e+02 2.796e+02, threshold=2.398e+02, percent-clipped=1.0 2023-11-18 03:12:13,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=15.0 2023-11-18 03:12:24,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2023-11-18 03:12:59,573 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4750, loss[loss=0.2171, simple_loss=0.2155, pruned_loss=0.09484, audio_tagging_loss=0.01448, over 15613.00 frames. ], tot_loss[loss=0.1641, simple_loss=0.1583, pruned_loss=0.07092, audio_tagging_loss=0.01407, over 3045952.53 frames. ], batch size: 58, lr: 4.14e-02, grad_scale: 64.0 2023-11-18 03:13:22,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=31800.0, ans=0.003956521739130435 2023-11-18 03:13:23,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=31800.0, ans=0.04949747468305833 2023-11-18 03:13:33,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.13 vs. limit=12.0 2023-11-18 03:13:35,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.98 vs. limit=22.5 2023-11-18 03:13:47,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-18 03:13:55,721 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4800, loss[loss=0.1928, simple_loss=0.1809, pruned_loss=0.08658, audio_tagging_loss=0.01576, over 15979.00 frames. ], tot_loss[loss=0.1651, simple_loss=0.1592, pruned_loss=0.07141, audio_tagging_loss=0.01407, over 3048635.77 frames. ], batch size: 61, lr: 4.13e-02, grad_scale: 64.0 2023-11-18 03:13:59,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 1.059e+02 1.265e+02 1.558e+02 2.176e+02, threshold=2.529e+02, percent-clipped=0.0 2023-11-18 03:14:08,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2023-11-18 03:14:13,654 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.578e+00 2023-11-18 03:14:36,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=32200.0, ans=0.125 2023-11-18 03:14:51,909 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4850, loss[loss=0.1986, simple_loss=0.1944, pruned_loss=0.08796, audio_tagging_loss=0.01345, over 15524.00 frames. ], tot_loss[loss=0.1652, simple_loss=0.1598, pruned_loss=0.0712, audio_tagging_loss=0.01409, over 3045817.43 frames. ], batch size: 56, lr: 4.12e-02, grad_scale: 64.0 2023-11-18 03:15:06,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=32400.0, ans=0.0 2023-11-18 03:15:11,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=32400.0, ans=0.2 2023-11-18 03:15:13,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=32400.0, ans=0.0 2023-11-18 03:15:21,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.26 vs. limit=10.0 2023-11-18 03:15:28,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=32533.333333333332, ans=0.05 2023-11-18 03:15:33,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=32533.333333333332, ans=0.0 2023-11-18 03:15:39,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.69 vs. limit=15.0 2023-11-18 03:15:41,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=32600.0, ans=0.125 2023-11-18 03:15:46,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=32600.0, ans=0.1 2023-11-18 03:15:48,659 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4900, loss[loss=0.1924, simple_loss=0.1864, pruned_loss=0.08314, audio_tagging_loss=0.01602, over 15530.00 frames. ], tot_loss[loss=0.1635, simple_loss=0.1582, pruned_loss=0.07028, audio_tagging_loss=0.01411, over 3044742.88 frames. ], batch size: 57, lr: 4.12e-02, grad_scale: 64.0 2023-11-18 03:15:52,880 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 1.045e+02 1.197e+02 1.386e+02 2.012e+02, threshold=2.394e+02, percent-clipped=0.0 2023-11-18 03:16:09,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32800.0, ans=0.1 2023-11-18 03:16:10,295 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:16:11,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=32800.0, ans=0.5 2023-11-18 03:16:19,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=32800.0, ans=0.2 2023-11-18 03:16:43,794 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 4950, loss[loss=0.2325, simple_loss=0.2208, pruned_loss=0.1088, audio_tagging_loss=0.01338, over 15764.00 frames. ], tot_loss[loss=0.1644, simple_loss=0.1595, pruned_loss=0.07089, audio_tagging_loss=0.01379, over 3040178.66 frames. ], batch size: 58, lr: 4.11e-02, grad_scale: 64.0 2023-11-18 03:16:46,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=12.0 2023-11-18 03:16:52,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.17 vs. limit=12.0 2023-11-18 03:16:57,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=33066.666666666664, ans=0.125 2023-11-18 03:17:04,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=33066.666666666664, ans=0.2 2023-11-18 03:17:12,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=33133.333333333336, ans=10.0 2023-11-18 03:17:14,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=33133.333333333336, ans=0.125 2023-11-18 03:17:18,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2023-11-18 03:17:40,578 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5000, loss[loss=0.09154, simple_loss=0.08282, pruned_loss=0.03406, audio_tagging_loss=0.01606, over 14600.00 frames. ], tot_loss[loss=0.1624, simple_loss=0.158, pruned_loss=0.06978, audio_tagging_loss=0.01363, over 3041578.11 frames. ], batch size: 57, lr: 4.10e-02, grad_scale: 64.0 2023-11-18 03:17:45,401 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 1.071e+02 1.252e+02 1.412e+02 1.907e+02, threshold=2.505e+02, percent-clipped=0.0 2023-11-18 03:17:54,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-18 03:18:09,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2023-11-18 03:18:12,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=33466.666666666664, ans=0.125 2023-11-18 03:18:26,221 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.79 vs. limit=15.0 2023-11-18 03:18:38,057 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5050, loss[loss=0.1773, simple_loss=0.1798, pruned_loss=0.07626, audio_tagging_loss=0.01115, over 15602.00 frames. ], tot_loss[loss=0.1606, simple_loss=0.1562, pruned_loss=0.06895, audio_tagging_loss=0.01358, over 3035150.44 frames. ], batch size: 55, lr: 4.10e-02, grad_scale: 64.0 2023-11-18 03:18:39,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=33666.666666666664, ans=0.2 2023-11-18 03:18:46,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33666.666666666664, ans=0.1 2023-11-18 03:19:00,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33800.0, ans=0.1 2023-11-18 03:19:02,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2023-11-18 03:19:19,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=33866.666666666664, ans=0.0035072463768115953 2023-11-18 03:19:30,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=33933.333333333336, ans=0.125 2023-11-18 03:19:33,397 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5100, loss[loss=0.2094, simple_loss=0.2123, pruned_loss=0.09129, audio_tagging_loss=0.01192, over 14808.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.1557, pruned_loss=0.0686, audio_tagging_loss=0.01349, over 3034353.52 frames. ], batch size: 55, lr: 4.09e-02, grad_scale: 64.0 2023-11-18 03:19:37,212 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.96 vs. limit=15.0 2023-11-18 03:19:37,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 1.065e+02 1.271e+02 1.460e+02 2.434e+02, threshold=2.541e+02, percent-clipped=0.0 2023-11-18 03:19:37,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=34000.0, ans=0.125 2023-11-18 03:19:42,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-11-18 03:20:05,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=34133.333333333336, ans=0.0 2023-11-18 03:20:20,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.30 vs. limit=22.5 2023-11-18 03:20:29,376 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5150, loss[loss=0.1297, simple_loss=0.1353, pruned_loss=0.05003, audio_tagging_loss=0.01207, over 15232.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.1572, pruned_loss=0.06901, audio_tagging_loss=0.01345, over 3028505.70 frames. ], batch size: 56, lr: 4.09e-02, grad_scale: 64.0 2023-11-18 03:20:30,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=34333.333333333336, ans=0.125 2023-11-18 03:20:37,681 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:20:42,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=34400.0, ans=0.125 2023-11-18 03:20:49,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=34400.0, ans=0.125 2023-11-18 03:20:56,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=34466.666666666664, ans=0.125 2023-11-18 03:21:18,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2023-11-18 03:21:26,263 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5200, loss[loss=0.1122, simple_loss=0.1036, pruned_loss=0.04681, audio_tagging_loss=0.01357, over 15854.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.1574, pruned_loss=0.06892, audio_tagging_loss=0.01348, over 3036399.97 frames. ], batch size: 62, lr: 4.08e-02, grad_scale: 64.0 2023-11-18 03:21:30,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.892e+01 1.044e+02 1.171e+02 1.375e+02 2.529e+02, threshold=2.342e+02, percent-clipped=0.0 2023-11-18 03:21:48,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=34800.0, ans=0.0 2023-11-18 03:21:57,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.36 vs. limit=22.5 2023-11-18 03:22:22,079 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5250, loss[loss=0.2152, simple_loss=0.1997, pruned_loss=0.102, audio_tagging_loss=0.01343, over 13439.00 frames. ], tot_loss[loss=0.1611, simple_loss=0.1572, pruned_loss=0.06899, audio_tagging_loss=0.01353, over 3038606.62 frames. ], batch size: 50, lr: 4.07e-02, grad_scale: 64.0 2023-11-18 03:22:36,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=35066.666666666664, ans=0.5 2023-11-18 03:22:55,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=35200.0, ans=0.125 2023-11-18 03:23:04,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=35200.0, ans=0.0 2023-11-18 03:23:07,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=35266.666666666664, ans=0.125 2023-11-18 03:23:18,038 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5300, loss[loss=0.1692, simple_loss=0.1606, pruned_loss=0.07421, audio_tagging_loss=0.01472, over 15620.00 frames. ], tot_loss[loss=0.161, simple_loss=0.1573, pruned_loss=0.06899, audio_tagging_loss=0.01339, over 3038391.27 frames. ], batch size: 60, lr: 4.07e-02, grad_scale: 64.0 2023-11-18 03:23:22,277 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.314e+01 1.054e+02 1.180e+02 1.432e+02 2.621e+02, threshold=2.360e+02, percent-clipped=2.0 2023-11-18 03:24:12,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=35600.0, ans=0.2 2023-11-18 03:24:14,618 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5350, loss[loss=0.1469, simple_loss=0.1608, pruned_loss=0.05796, audio_tagging_loss=0.008527, over 14993.00 frames. ], tot_loss[loss=0.161, simple_loss=0.1578, pruned_loss=0.06878, audio_tagging_loss=0.01336, over 3035119.97 frames. ], batch size: 57, lr: 4.06e-02, grad_scale: 64.0 2023-11-18 03:24:34,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=35733.333333333336, ans=0.0 2023-11-18 03:24:47,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=35866.666666666664, ans=0.125 2023-11-18 03:24:47,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=35866.666666666664, ans=0.125 2023-11-18 03:24:58,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=12.0 2023-11-18 03:25:10,836 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5400, loss[loss=0.1802, simple_loss=0.1673, pruned_loss=0.08011, audio_tagging_loss=0.01647, over 15697.00 frames. ], tot_loss[loss=0.1612, simple_loss=0.158, pruned_loss=0.06877, audio_tagging_loss=0.0134, over 3041587.67 frames. ], batch size: 58, lr: 4.05e-02, grad_scale: 64.0 2023-11-18 03:25:13,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=36000.0, ans=0.125 2023-11-18 03:25:15,063 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.377e+01 1.086e+02 1.314e+02 1.571e+02 2.162e+02, threshold=2.627e+02, percent-clipped=0.0 2023-11-18 03:25:21,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=36066.666666666664, ans=0.05 2023-11-18 03:25:23,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=36066.666666666664, ans=0.1 2023-11-18 03:25:25,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=36066.666666666664, ans=0.2 2023-11-18 03:25:38,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=36133.333333333336, ans=0.125 2023-11-18 03:25:39,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=36133.333333333336, ans=0.0 2023-11-18 03:25:40,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=36133.333333333336, ans=0.125 2023-11-18 03:25:48,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=36200.0, ans=0.125 2023-11-18 03:25:48,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=36200.0, ans=0.2 2023-11-18 03:25:59,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36266.666666666664, ans=0.1 2023-11-18 03:26:06,203 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5450, loss[loss=0.1504, simple_loss=0.1516, pruned_loss=0.0641, audio_tagging_loss=0.01055, over 15264.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.1569, pruned_loss=0.06843, audio_tagging_loss=0.01359, over 3044612.69 frames. ], batch size: 57, lr: 4.05e-02, grad_scale: 64.0 2023-11-18 03:26:10,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=36333.333333333336, ans=0.2 2023-11-18 03:26:30,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=21.98 vs. limit=15.0 2023-11-18 03:26:31,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=36466.666666666664, ans=0.125 2023-11-18 03:26:43,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=36533.333333333336, ans=0.2 2023-11-18 03:26:51,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=36600.0, ans=12.0 2023-11-18 03:26:59,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=36600.0, ans=0.00291304347826087 2023-11-18 03:27:01,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=15.0 2023-11-18 03:27:03,289 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5500, loss[loss=0.1385, simple_loss=0.1243, pruned_loss=0.0615, audio_tagging_loss=0.01492, over 15962.00 frames. ], tot_loss[loss=0.16, simple_loss=0.1563, pruned_loss=0.06816, audio_tagging_loss=0.01372, over 3045569.72 frames. ], batch size: 60, lr: 4.04e-02, grad_scale: 64.0 2023-11-18 03:27:07,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.157e+01 1.024e+02 1.184e+02 1.343e+02 1.900e+02, threshold=2.368e+02, percent-clipped=0.0 2023-11-18 03:27:16,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=8.0 2023-11-18 03:27:34,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=36800.0, ans=15.0 2023-11-18 03:27:47,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=36933.333333333336, ans=0.0 2023-11-18 03:27:49,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=36933.333333333336, ans=0.1 2023-11-18 03:27:58,586 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5550, loss[loss=0.138, simple_loss=0.1275, pruned_loss=0.05846, audio_tagging_loss=0.01574, over 16077.00 frames. ], tot_loss[loss=0.1596, simple_loss=0.1553, pruned_loss=0.06797, audio_tagging_loss=0.01395, over 3047579.59 frames. ], batch size: 61, lr: 4.03e-02, grad_scale: 64.0 2023-11-18 03:28:05,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=37000.0, ans=0.125 2023-11-18 03:28:43,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=37266.666666666664, ans=0.125 2023-11-18 03:28:54,756 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5600, loss[loss=0.1512, simple_loss=0.1446, pruned_loss=0.06467, audio_tagging_loss=0.01422, over 15001.00 frames. ], tot_loss[loss=0.1577, simple_loss=0.1537, pruned_loss=0.06673, audio_tagging_loss=0.01416, over 3046265.80 frames. ], batch size: 56, lr: 4.03e-02, grad_scale: 64.0 2023-11-18 03:28:59,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 1.034e+02 1.195e+02 1.444e+02 2.133e+02, threshold=2.390e+02, percent-clipped=0.0 2023-11-18 03:29:00,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=37333.333333333336, ans=0.2 2023-11-18 03:29:08,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.05 vs. limit=15.0 2023-11-18 03:29:10,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=37400.0, ans=0.125 2023-11-18 03:29:24,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=37466.666666666664, ans=0.2 2023-11-18 03:29:28,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=37533.333333333336, ans=0.125 2023-11-18 03:29:29,931 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-11-18 03:29:35,259 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:29:35,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=37533.333333333336, ans=0.125 2023-11-18 03:29:44,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=37600.0, ans=0.125 2023-11-18 03:29:51,788 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5650, loss[loss=0.1817, simple_loss=0.1726, pruned_loss=0.08103, audio_tagging_loss=0.01436, over 14186.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.1515, pruned_loss=0.06565, audio_tagging_loss=0.01424, over 3049433.73 frames. ], batch size: 53, lr: 4.02e-02, grad_scale: 128.0 2023-11-18 03:30:04,804 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.740e+00 2023-11-18 03:30:13,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=37800.0, ans=0.125 2023-11-18 03:30:19,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.95 vs. limit=6.0 2023-11-18 03:30:25,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2023-11-18 03:30:28,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2023-11-18 03:30:47,187 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5700, loss[loss=0.1327, simple_loss=0.1284, pruned_loss=0.05324, audio_tagging_loss=0.01525, over 15219.00 frames. ], tot_loss[loss=0.1546, simple_loss=0.1505, pruned_loss=0.0651, audio_tagging_loss=0.01424, over 3047565.68 frames. ], batch size: 56, lr: 4.02e-02, grad_scale: 64.0 2023-11-18 03:30:50,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=38000.0, ans=0.125 2023-11-18 03:30:52,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.432e+01 1.093e+02 1.259e+02 1.491e+02 2.385e+02, threshold=2.519e+02, percent-clipped=0.0 2023-11-18 03:30:56,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=38066.666666666664, ans=0.0025942028985507255 2023-11-18 03:31:00,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=38066.666666666664, ans=0.2 2023-11-18 03:31:05,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=38066.666666666664, ans=0.025 2023-11-18 03:31:12,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2023-11-18 03:31:12,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.38 vs. limit=22.5 2023-11-18 03:31:22,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38200.0, ans=0.1 2023-11-18 03:31:27,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=38200.0, ans=0.0 2023-11-18 03:31:33,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=38266.666666666664, ans=0.0 2023-11-18 03:31:38,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=38266.666666666664, ans=0.0 2023-11-18 03:31:42,339 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5750, loss[loss=0.1629, simple_loss=0.1609, pruned_loss=0.07052, audio_tagging_loss=0.01195, over 15148.00 frames. ], tot_loss[loss=0.1542, simple_loss=0.1503, pruned_loss=0.06505, audio_tagging_loss=0.01405, over 3050577.00 frames. ], batch size: 56, lr: 4.01e-02, grad_scale: 32.0 2023-11-18 03:31:47,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=38333.333333333336, ans=0.1 2023-11-18 03:31:51,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=38333.333333333336, ans=0.0 2023-11-18 03:32:07,195 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:32:14,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=38466.666666666664, ans=0.125 2023-11-18 03:32:31,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=38600.0, ans=0.1 2023-11-18 03:32:39,949 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5800, loss[loss=0.1555, simple_loss=0.1682, pruned_loss=0.06462, audio_tagging_loss=0.00681, over 14863.00 frames. ], tot_loss[loss=0.1545, simple_loss=0.1509, pruned_loss=0.06525, audio_tagging_loss=0.01377, over 3054499.72 frames. ], batch size: 54, lr: 4.00e-02, grad_scale: 32.0 2023-11-18 03:32:46,931 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 1.065e+02 1.200e+02 1.362e+02 2.023e+02, threshold=2.399e+02, percent-clipped=0.0 2023-11-18 03:32:49,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.55 vs. limit=10.0 2023-11-18 03:33:08,463 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:33:17,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=38866.666666666664, ans=0.025 2023-11-18 03:33:25,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=38933.333333333336, ans=0.025 2023-11-18 03:33:30,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=38933.333333333336, ans=0.0 2023-11-18 03:33:35,988 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5850, loss[loss=0.1636, simple_loss=0.16, pruned_loss=0.06877, audio_tagging_loss=0.01486, over 15931.00 frames. ], tot_loss[loss=0.1552, simple_loss=0.152, pruned_loss=0.06564, audio_tagging_loss=0.01352, over 3050924.41 frames. ], batch size: 59, lr: 4.00e-02, grad_scale: 32.0 2023-11-18 03:33:40,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=39000.0, ans=0.125 2023-11-18 03:33:43,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=39000.0, ans=0.125 2023-11-18 03:33:44,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=39000.0, ans=0.125 2023-11-18 03:33:58,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=39133.333333333336, ans=0.125 2023-11-18 03:34:19,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39200.0, ans=0.1 2023-11-18 03:34:31,924 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5900, loss[loss=0.1745, simple_loss=0.1737, pruned_loss=0.07323, audio_tagging_loss=0.01447, over 15654.00 frames. ], tot_loss[loss=0.1552, simple_loss=0.1522, pruned_loss=0.06566, audio_tagging_loss=0.01345, over 3046296.93 frames. ], batch size: 59, lr: 3.99e-02, grad_scale: 32.0 2023-11-18 03:34:38,801 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.189e+01 1.114e+02 1.332e+02 1.512e+02 2.705e+02, threshold=2.665e+02, percent-clipped=2.0 2023-11-18 03:34:41,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=39333.333333333336, ans=0.1 2023-11-18 03:35:28,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=39666.666666666664, ans=0.09899494936611666 2023-11-18 03:35:28,897 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 5950, loss[loss=0.1828, simple_loss=0.1708, pruned_loss=0.08673, audio_tagging_loss=0.01066, over 14979.00 frames. ], tot_loss[loss=0.1549, simple_loss=0.1521, pruned_loss=0.06549, audio_tagging_loss=0.0134, over 3049559.65 frames. ], batch size: 54, lr: 3.98e-02, grad_scale: 32.0 2023-11-18 03:35:35,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=39666.666666666664, ans=0.0 2023-11-18 03:35:35,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=39666.666666666664, ans=0.125 2023-11-18 03:36:16,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=39933.333333333336, ans=0.125 2023-11-18 03:36:22,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=39933.333333333336, ans=0.125 2023-11-18 03:36:24,713 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6000, loss[loss=0.249, simple_loss=0.2468, pruned_loss=0.1172, audio_tagging_loss=0.008357, over 15514.00 frames. ], tot_loss[loss=0.1547, simple_loss=0.1518, pruned_loss=0.06537, audio_tagging_loss=0.01345, over 3056686.95 frames. ], batch size: 55, lr: 3.98e-02, grad_scale: 32.0 2023-11-18 03:36:24,714 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 03:36:43,627 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7751, 4.1502, 2.3824, 4.1871], device='cuda:3') 2023-11-18 03:36:58,779 INFO [train_asr.py:1147] (3/4) Epoch 1, validation: loss=0.1009, simple_loss=0.07718, pruned_loss=0.02169, audio_tagging_loss=0.04066, over 4681554.00 frames. 2023-11-18 03:36:58,780 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 03:37:03,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-18 03:37:05,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 1.087e+02 1.275e+02 1.499e+02 2.354e+02, threshold=2.549e+02, percent-clipped=0.0 2023-11-18 03:37:25,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.96 vs. limit=22.5 2023-11-18 03:37:39,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=40200.0, ans=0.125 2023-11-18 03:37:39,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2023-11-18 03:37:40,583 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:37:47,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40266.666666666664, ans=0.1 2023-11-18 03:37:55,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-11-18 03:37:55,889 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6050, loss[loss=0.1361, simple_loss=0.1211, pruned_loss=0.06078, audio_tagging_loss=0.01478, over 14409.00 frames. ], tot_loss[loss=0.1537, simple_loss=0.1512, pruned_loss=0.06466, audio_tagging_loss=0.0134, over 3053852.19 frames. ], batch size: 58, lr: 3.97e-02, grad_scale: 32.0 2023-11-18 03:37:58,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=40333.333333333336, ans=0.125 2023-11-18 03:38:06,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=40400.0, ans=10.0 2023-11-18 03:38:15,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40400.0, ans=0.1 2023-11-18 03:38:26,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-11-18 03:38:45,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=40600.0, ans=0.0 2023-11-18 03:38:51,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.45 vs. limit=22.5 2023-11-18 03:38:52,467 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6100, loss[loss=0.1537, simple_loss=0.1428, pruned_loss=0.06387, audio_tagging_loss=0.01846, over 15566.00 frames. ], tot_loss[loss=0.1542, simple_loss=0.1517, pruned_loss=0.06507, audio_tagging_loss=0.01331, over 3055301.59 frames. ], batch size: 61, lr: 3.96e-02, grad_scale: 32.0 2023-11-18 03:38:58,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2023-11-18 03:38:58,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.303e+01 1.093e+02 1.234e+02 1.511e+02 2.648e+02, threshold=2.468e+02, percent-clipped=3.0 2023-11-18 03:39:05,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=40733.333333333336, ans=0.125 2023-11-18 03:39:20,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.99 vs. limit=15.0 2023-11-18 03:39:30,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=40866.666666666664, ans=0.2 2023-11-18 03:39:48,157 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6150, loss[loss=0.1225, simple_loss=0.1119, pruned_loss=0.04807, audio_tagging_loss=0.01849, over 15276.00 frames. ], tot_loss[loss=0.154, simple_loss=0.1512, pruned_loss=0.06497, audio_tagging_loss=0.01339, over 3053161.25 frames. ], batch size: 59, lr: 3.96e-02, grad_scale: 32.0 2023-11-18 03:39:54,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.69 vs. limit=22.5 2023-11-18 03:39:59,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2023-11-18 03:40:33,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=41266.666666666664, ans=0.0 2023-11-18 03:40:43,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=41266.666666666664, ans=0.001898550724637682 2023-11-18 03:40:45,699 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6200, loss[loss=0.1469, simple_loss=0.144, pruned_loss=0.06202, audio_tagging_loss=0.01289, over 15188.00 frames. ], tot_loss[loss=0.1539, simple_loss=0.151, pruned_loss=0.06483, audio_tagging_loss=0.01358, over 3046268.90 frames. ], batch size: 56, lr: 3.95e-02, grad_scale: 32.0 2023-11-18 03:40:48,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=41333.333333333336, ans=0.125 2023-11-18 03:40:53,128 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.054e+01 1.077e+02 1.264e+02 1.430e+02 2.412e+02, threshold=2.529e+02, percent-clipped=0.0 2023-11-18 03:41:21,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-11-18 03:41:32,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=41600.0, ans=0.2 2023-11-18 03:41:39,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=41600.0, ans=0.5 2023-11-18 03:41:43,086 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6250, loss[loss=0.1192, simple_loss=0.1095, pruned_loss=0.04834, audio_tagging_loss=0.01611, over 15350.00 frames. ], tot_loss[loss=0.1546, simple_loss=0.1515, pruned_loss=0.06504, audio_tagging_loss=0.01377, over 3041057.01 frames. ], batch size: 60, lr: 3.94e-02, grad_scale: 32.0 2023-11-18 03:41:44,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41666.666666666664, ans=0.1 2023-11-18 03:41:58,451 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:42:03,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=41733.333333333336, ans=0.125 2023-11-18 03:42:09,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2023-11-18 03:42:24,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.08 vs. limit=15.0 2023-11-18 03:42:29,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=41933.333333333336, ans=0.0 2023-11-18 03:42:32,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41933.333333333336, ans=0.1 2023-11-18 03:42:39,082 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6300, loss[loss=0.1603, simple_loss=0.1446, pruned_loss=0.06918, audio_tagging_loss=0.01881, over 14225.00 frames. ], tot_loss[loss=0.1564, simple_loss=0.1534, pruned_loss=0.06584, audio_tagging_loss=0.01391, over 3039730.04 frames. ], batch size: 59, lr: 3.94e-02, grad_scale: 32.0 2023-11-18 03:42:46,036 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.073e+01 1.061e+02 1.176e+02 1.388e+02 2.867e+02, threshold=2.352e+02, percent-clipped=1.0 2023-11-18 03:42:47,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-11-18 03:42:53,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=42066.666666666664, ans=0.0017246376811594216 2023-11-18 03:42:55,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=42066.666666666664, ans=0.2 2023-11-18 03:43:23,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=8.0 2023-11-18 03:43:23,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=42266.666666666664, ans=0.1 2023-11-18 03:43:34,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=42266.666666666664, ans=0.5 2023-11-18 03:43:36,573 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6350, loss[loss=0.1422, simple_loss=0.1369, pruned_loss=0.05583, audio_tagging_loss=0.01796, over 14055.00 frames. ], tot_loss[loss=0.1563, simple_loss=0.1532, pruned_loss=0.06574, audio_tagging_loss=0.01396, over 3043714.92 frames. ], batch size: 52, lr: 3.93e-02, grad_scale: 32.0 2023-11-18 03:43:38,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2023-11-18 03:43:38,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=42333.333333333336, ans=0.09899494936611666 2023-11-18 03:43:46,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=42333.333333333336, ans=15.0 2023-11-18 03:43:55,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.29 vs. limit=22.5 2023-11-18 03:44:14,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=12.0 2023-11-18 03:44:34,105 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6400, loss[loss=0.138, simple_loss=0.1243, pruned_loss=0.05702, audio_tagging_loss=0.01883, over 14572.00 frames. ], tot_loss[loss=0.158, simple_loss=0.1549, pruned_loss=0.0667, audio_tagging_loss=0.01385, over 3052731.95 frames. ], batch size: 59, lr: 3.92e-02, grad_scale: 32.0 2023-11-18 03:44:40,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.367e+01 1.120e+02 1.287e+02 1.674e+02 2.598e+02, threshold=2.575e+02, percent-clipped=2.0 2023-11-18 03:44:57,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=15.0 2023-11-18 03:45:01,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=42800.0, ans=0.125 2023-11-18 03:45:30,272 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6450, loss[loss=0.165, simple_loss=0.1577, pruned_loss=0.07004, audio_tagging_loss=0.0161, over 15775.00 frames. ], tot_loss[loss=0.1573, simple_loss=0.1541, pruned_loss=0.06627, audio_tagging_loss=0.014, over 3044396.73 frames. ], batch size: 59, lr: 3.92e-02, grad_scale: 32.0 2023-11-18 03:45:53,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=43133.333333333336, ans=0.125 2023-11-18 03:46:04,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=43200.0, ans=0.035 2023-11-18 03:46:18,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=43266.666666666664, ans=0.125 2023-11-18 03:46:27,437 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6500, loss[loss=0.2038, simple_loss=0.2041, pruned_loss=0.09366, audio_tagging_loss=0.008096, over 14466.00 frames. ], tot_loss[loss=0.1577, simple_loss=0.1546, pruned_loss=0.06643, audio_tagging_loss=0.01398, over 3050102.16 frames. ], batch size: 58, lr: 3.91e-02, grad_scale: 32.0 2023-11-18 03:46:34,390 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.242e+01 1.070e+02 1.244e+02 1.503e+02 2.306e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 03:46:34,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=43333.333333333336, ans=0.0 2023-11-18 03:46:35,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=43333.333333333336, ans=0.125 2023-11-18 03:47:24,042 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6550, loss[loss=0.149, simple_loss=0.1362, pruned_loss=0.06551, audio_tagging_loss=0.01535, over 14705.00 frames. ], tot_loss[loss=0.157, simple_loss=0.1545, pruned_loss=0.06612, audio_tagging_loss=0.01366, over 3048347.82 frames. ], batch size: 56, lr: 3.91e-02, grad_scale: 32.0 2023-11-18 03:47:29,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=43666.666666666664, ans=0.0 2023-11-18 03:47:43,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.15 vs. limit=22.5 2023-11-18 03:48:04,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2023-11-18 03:48:17,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=43933.333333333336, ans=0.09899494936611666 2023-11-18 03:48:20,983 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6600, loss[loss=0.2035, simple_loss=0.2232, pruned_loss=0.08238, audio_tagging_loss=0.009529, over 15680.00 frames. ], tot_loss[loss=0.1575, simple_loss=0.1553, pruned_loss=0.06633, audio_tagging_loss=0.01348, over 3049414.27 frames. ], batch size: 54, lr: 3.90e-02, grad_scale: 32.0 2023-11-18 03:48:28,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.463e+01 1.068e+02 1.215e+02 1.424e+02 2.055e+02, threshold=2.430e+02, percent-clipped=0.0 2023-11-18 03:48:34,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=44066.666666666664, ans=0.0012898550724637688 2023-11-18 03:48:47,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.59 vs. limit=22.5 2023-11-18 03:48:54,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=44200.0, ans=0.125 2023-11-18 03:49:06,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=44266.666666666664, ans=0.0 2023-11-18 03:49:09,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44266.666666666664, ans=0.1 2023-11-18 03:49:11,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=44266.666666666664, ans=15.0 2023-11-18 03:49:17,877 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6650, loss[loss=0.1903, simple_loss=0.1906, pruned_loss=0.08228, audio_tagging_loss=0.01274, over 16031.00 frames. ], tot_loss[loss=0.1562, simple_loss=0.1542, pruned_loss=0.06569, audio_tagging_loss=0.01338, over 3046933.45 frames. ], batch size: 61, lr: 3.89e-02, grad_scale: 32.0 2023-11-18 03:49:32,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=44400.0, ans=0.0012173913043478264 2023-11-18 03:49:52,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=44533.333333333336, ans=0.125 2023-11-18 03:49:58,173 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.479e+00 2023-11-18 03:50:15,178 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6700, loss[loss=0.1525, simple_loss=0.1543, pruned_loss=0.06214, audio_tagging_loss=0.01325, over 15524.00 frames. ], tot_loss[loss=0.1543, simple_loss=0.1524, pruned_loss=0.06465, audio_tagging_loss=0.01351, over 3044829.05 frames. ], batch size: 57, lr: 3.89e-02, grad_scale: 32.0 2023-11-18 03:50:18,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=44666.666666666664, ans=0.09899494936611666 2023-11-18 03:50:21,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 1.020e+02 1.157e+02 1.284e+02 2.181e+02, threshold=2.314e+02, percent-clipped=0.0 2023-11-18 03:50:26,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.29 vs. limit=22.5 2023-11-18 03:50:35,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=44733.333333333336, ans=0.0 2023-11-18 03:50:40,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2023-11-18 03:50:40,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-11-18 03:51:11,213 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6750, loss[loss=0.13, simple_loss=0.1231, pruned_loss=0.05385, audio_tagging_loss=0.01455, over 14843.00 frames. ], tot_loss[loss=0.1542, simple_loss=0.1524, pruned_loss=0.06454, audio_tagging_loss=0.01342, over 3042307.08 frames. ], batch size: 58, lr: 3.88e-02, grad_scale: 32.0 2023-11-18 03:51:19,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=45000.0, ans=0.125 2023-11-18 03:51:35,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=45133.333333333336, ans=0.125 2023-11-18 03:51:38,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=45133.333333333336, ans=0.125 2023-11-18 03:51:40,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=45133.333333333336, ans=0.2 2023-11-18 03:51:59,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=45266.666666666664, ans=0.0 2023-11-18 03:52:08,434 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6800, loss[loss=0.1085, simple_loss=0.1064, pruned_loss=0.03756, audio_tagging_loss=0.01773, over 14580.00 frames. ], tot_loss[loss=0.153, simple_loss=0.151, pruned_loss=0.06403, audio_tagging_loss=0.0135, over 3042967.67 frames. ], batch size: 55, lr: 3.87e-02, grad_scale: 32.0 2023-11-18 03:52:09,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=45333.333333333336, ans=0.125 2023-11-18 03:52:15,494 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.237e+01 1.104e+02 1.256e+02 1.386e+02 2.512e+02, threshold=2.511e+02, percent-clipped=1.0 2023-11-18 03:52:27,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.33 vs. limit=22.5 2023-11-18 03:52:28,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=45400.0, ans=0.0010000000000000009 2023-11-18 03:52:28,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-11-18 03:52:34,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=45466.666666666664, ans=0.125 2023-11-18 03:52:45,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=45533.333333333336, ans=0.125 2023-11-18 03:53:05,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.19 vs. limit=6.0 2023-11-18 03:53:05,767 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6850, loss[loss=0.1214, simple_loss=0.1188, pruned_loss=0.05152, audio_tagging_loss=0.01054, over 13906.00 frames. ], tot_loss[loss=0.1541, simple_loss=0.152, pruned_loss=0.06466, audio_tagging_loss=0.01345, over 3036153.95 frames. ], batch size: 54, lr: 3.87e-02, grad_scale: 32.0 2023-11-18 03:53:11,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=45666.666666666664, ans=0.125 2023-11-18 03:53:15,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.41 vs. limit=10.0 2023-11-18 03:53:17,995 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.97 vs. limit=15.0 2023-11-18 03:53:20,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.38 vs. limit=10.0 2023-11-18 03:53:35,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=45800.0, ans=0.0 2023-11-18 03:53:46,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45866.666666666664, ans=0.1 2023-11-18 03:54:01,995 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6900, loss[loss=0.1519, simple_loss=0.1555, pruned_loss=0.05901, audio_tagging_loss=0.01509, over 14040.00 frames. ], tot_loss[loss=0.1547, simple_loss=0.1531, pruned_loss=0.06487, audio_tagging_loss=0.01329, over 3040600.77 frames. ], batch size: 55, lr: 3.86e-02, grad_scale: 32.0 2023-11-18 03:54:08,335 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.826e+01 1.075e+02 1.233e+02 1.509e+02 2.353e+02, threshold=2.467e+02, percent-clipped=0.0 2023-11-18 03:54:09,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=46000.0, ans=0.05 2023-11-18 03:54:13,468 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.931e+00 2023-11-18 03:54:27,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=27.88 vs. limit=22.5 2023-11-18 03:54:28,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=46133.333333333336, ans=0.1 2023-11-18 03:54:45,373 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:54:58,482 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 6950, loss[loss=0.1675, simple_loss=0.1534, pruned_loss=0.07358, audio_tagging_loss=0.01727, over 14721.00 frames. ], tot_loss[loss=0.1542, simple_loss=0.1529, pruned_loss=0.06451, audio_tagging_loss=0.01323, over 3041310.12 frames. ], batch size: 55, lr: 3.85e-02, grad_scale: 32.0 2023-11-18 03:55:28,251 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.20 vs. limit=22.5 2023-11-18 03:55:35,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=46533.333333333336, ans=0.125 2023-11-18 03:55:55,798 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7000, loss[loss=0.172, simple_loss=0.1589, pruned_loss=0.0758, audio_tagging_loss=0.01672, over 15083.00 frames. ], tot_loss[loss=0.1529, simple_loss=0.1517, pruned_loss=0.06382, audio_tagging_loss=0.01328, over 3046979.82 frames. ], batch size: 56, lr: 3.85e-02, grad_scale: 32.0 2023-11-18 03:55:57,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=46666.666666666664, ans=0.125 2023-11-18 03:56:02,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.13 vs. limit=10.0 2023-11-18 03:56:02,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.771e+01 1.138e+02 1.312e+02 1.485e+02 2.708e+02, threshold=2.623e+02, percent-clipped=2.0 2023-11-18 03:56:04,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=46666.666666666664, ans=0.125 2023-11-18 03:56:10,018 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.168e+00 2023-11-18 03:56:14,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=46733.333333333336, ans=0.015 2023-11-18 03:56:25,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.16 vs. limit=22.5 2023-11-18 03:56:37,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2023-11-18 03:56:49,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=46933.333333333336, ans=0.0006666666666666661 2023-11-18 03:56:51,712 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7050, loss[loss=0.1805, simple_loss=0.1759, pruned_loss=0.0778, audio_tagging_loss=0.01474, over 15550.00 frames. ], tot_loss[loss=0.1534, simple_loss=0.152, pruned_loss=0.06404, audio_tagging_loss=0.01336, over 3049346.18 frames. ], batch size: 57, lr: 3.84e-02, grad_scale: 32.0 2023-11-18 03:56:54,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=47000.0, ans=22.5 2023-11-18 03:56:55,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=47000.0, ans=0.2 2023-11-18 03:57:03,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=47066.666666666664, ans=0.1 2023-11-18 03:57:15,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2023-11-18 03:57:24,281 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2023-11-18 03:57:26,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=47200.0, ans=0.125 2023-11-18 03:57:35,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=47266.666666666664, ans=0.125 2023-11-18 03:57:41,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=47266.666666666664, ans=0.125 2023-11-18 03:57:47,884 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7100, loss[loss=0.1613, simple_loss=0.161, pruned_loss=0.06746, audio_tagging_loss=0.01332, over 16258.00 frames. ], tot_loss[loss=0.1536, simple_loss=0.1523, pruned_loss=0.0639, audio_tagging_loss=0.0135, over 3052401.45 frames. ], batch size: 60, lr: 3.83e-02, grad_scale: 32.0 2023-11-18 03:57:55,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.676e+01 1.058e+02 1.182e+02 1.391e+02 1.929e+02, threshold=2.364e+02, percent-clipped=0.0 2023-11-18 03:58:06,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=47400.0, ans=0.125 2023-11-18 03:58:13,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47466.666666666664, ans=0.1 2023-11-18 03:58:18,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=47466.666666666664, ans=0.0 2023-11-18 03:58:25,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=47533.333333333336, ans=0.0005362318840579708 2023-11-18 03:58:35,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=47600.0, ans=0.0 2023-11-18 03:58:45,124 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7150, loss[loss=0.1551, simple_loss=0.1556, pruned_loss=0.05989, audio_tagging_loss=0.01737, over 15144.00 frames. ], tot_loss[loss=0.1532, simple_loss=0.1515, pruned_loss=0.06373, audio_tagging_loss=0.0137, over 3053344.51 frames. ], batch size: 57, lr: 3.83e-02, grad_scale: 32.0 2023-11-18 03:58:48,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=47666.666666666664, ans=0.2 2023-11-18 03:58:50,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=47666.666666666664, ans=0.0 2023-11-18 03:59:13,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=47800.0, ans=0.07 2023-11-18 03:59:14,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.79 vs. limit=15.0 2023-11-18 03:59:14,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=15.0 2023-11-18 03:59:14,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=47800.0, ans=0.0004782608695652179 2023-11-18 03:59:16,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=47866.666666666664, ans=0.0 2023-11-18 03:59:17,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-11-18 03:59:40,936 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7200, loss[loss=0.2066, simple_loss=0.1855, pruned_loss=0.1022, audio_tagging_loss=0.0116, over 14323.00 frames. ], tot_loss[loss=0.152, simple_loss=0.1499, pruned_loss=0.06319, audio_tagging_loss=0.01384, over 3046508.21 frames. ], batch size: 55, lr: 3.82e-02, grad_scale: 32.0 2023-11-18 03:59:43,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.20 vs. limit=15.0 2023-11-18 03:59:46,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-18 03:59:47,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 1.038e+02 1.215e+02 1.416e+02 1.908e+02, threshold=2.429e+02, percent-clipped=0.0 2023-11-18 03:59:52,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=48066.666666666664, ans=0.0 2023-11-18 03:59:53,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=48066.666666666664, ans=0.2 2023-11-18 04:00:06,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=48133.333333333336, ans=0.2 2023-11-18 04:00:10,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-18 04:00:11,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48133.333333333336, ans=0.1 2023-11-18 04:00:15,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=48200.0, ans=0.125 2023-11-18 04:00:24,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48200.0, ans=0.1 2023-11-18 04:00:24,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=15.0 2023-11-18 04:00:33,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=48266.666666666664, ans=0.2 2023-11-18 04:00:37,476 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7250, loss[loss=0.1249, simple_loss=0.141, pruned_loss=0.04071, audio_tagging_loss=0.01368, over 15050.00 frames. ], tot_loss[loss=0.1527, simple_loss=0.151, pruned_loss=0.06348, audio_tagging_loss=0.01377, over 3051339.66 frames. ], batch size: 55, lr: 3.82e-02, grad_scale: 32.0 2023-11-18 04:00:44,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=48333.333333333336, ans=0.2 2023-11-18 04:00:50,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.13 vs. limit=6.0 2023-11-18 04:00:54,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=48400.0, ans=0.125 2023-11-18 04:01:05,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2023-11-18 04:01:18,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.17 vs. limit=10.0 2023-11-18 04:01:34,451 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7300, loss[loss=0.1271, simple_loss=0.1202, pruned_loss=0.05256, audio_tagging_loss=0.0145, over 14657.00 frames. ], tot_loss[loss=0.1518, simple_loss=0.1503, pruned_loss=0.06307, audio_tagging_loss=0.01364, over 3047797.36 frames. ], batch size: 56, lr: 3.81e-02, grad_scale: 32.0 2023-11-18 04:01:40,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.111e+01 1.121e+02 1.282e+02 1.467e+02 2.763e+02, threshold=2.564e+02, percent-clipped=2.0 2023-11-18 04:01:43,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=48666.666666666664, ans=0.125 2023-11-18 04:01:52,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=48733.333333333336, ans=0.0002753623188405784 2023-11-18 04:01:54,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=48800.0, ans=0.2 2023-11-18 04:01:54,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=48800.0, ans=0.5 2023-11-18 04:02:06,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2023-11-18 04:02:15,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=48866.666666666664, ans=10.0 2023-11-18 04:02:15,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=48866.666666666664, ans=0.125 2023-11-18 04:02:20,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=48933.333333333336, ans=0.2 2023-11-18 04:02:30,021 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7350, loss[loss=0.2159, simple_loss=0.2214, pruned_loss=0.09403, audio_tagging_loss=0.01114, over 15209.00 frames. ], tot_loss[loss=0.1535, simple_loss=0.1523, pruned_loss=0.06387, audio_tagging_loss=0.01346, over 3055393.85 frames. ], batch size: 55, lr: 3.80e-02, grad_scale: 32.0 2023-11-18 04:02:42,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=49066.666666666664, ans=0.125 2023-11-18 04:02:49,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-11-18 04:03:01,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49133.333333333336, ans=0.1 2023-11-18 04:03:26,843 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7400, loss[loss=0.1747, simple_loss=0.1751, pruned_loss=0.07422, audio_tagging_loss=0.01291, over 15470.00 frames. ], tot_loss[loss=0.153, simple_loss=0.1522, pruned_loss=0.06351, audio_tagging_loss=0.01341, over 3049299.68 frames. ], batch size: 57, lr: 3.80e-02, grad_scale: 32.0 2023-11-18 04:03:31,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=49333.333333333336, ans=0.0 2023-11-18 04:03:33,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.201e+01 1.102e+02 1.229e+02 1.424e+02 2.293e+02, threshold=2.457e+02, percent-clipped=0.0 2023-11-18 04:03:41,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.12 vs. limit=10.0 2023-11-18 04:03:45,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=49400.0, ans=0.125 2023-11-18 04:04:09,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=49533.333333333336, ans=0.125 2023-11-18 04:04:11,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49600.0, ans=0.1 2023-11-18 04:04:14,150 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=8.630e+00 2023-11-18 04:04:23,578 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7450, loss[loss=0.1768, simple_loss=0.1657, pruned_loss=0.07971, audio_tagging_loss=0.01421, over 14917.00 frames. ], tot_loss[loss=0.1523, simple_loss=0.1512, pruned_loss=0.06333, audio_tagging_loss=0.01342, over 3044933.88 frames. ], batch size: 56, lr: 3.79e-02, grad_scale: 32.0 2023-11-18 04:04:23,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49666.666666666664, ans=0.1 2023-11-18 04:04:40,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.46 vs. limit=15.0 2023-11-18 04:04:43,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=49733.333333333336, ans=0.0 2023-11-18 04:04:56,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.48 vs. limit=15.0 2023-11-18 04:05:04,075 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:05:06,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49866.666666666664, ans=0.1 2023-11-18 04:05:12,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=49933.333333333336, ans=1.449275362318779e-05 2023-11-18 04:05:18,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49933.333333333336, ans=0.1 2023-11-18 04:05:20,161 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7500, loss[loss=0.1644, simple_loss=0.1693, pruned_loss=0.067, audio_tagging_loss=0.01272, over 15485.00 frames. ], tot_loss[loss=0.1534, simple_loss=0.1523, pruned_loss=0.06403, audio_tagging_loss=0.0132, over 3046025.69 frames. ], batch size: 57, lr: 3.78e-02, grad_scale: 32.0 2023-11-18 04:05:26,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 1.063e+02 1.222e+02 1.436e+02 2.018e+02, threshold=2.444e+02, percent-clipped=0.0 2023-11-18 04:05:27,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=50000.0, ans=0.125 2023-11-18 04:05:57,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=50200.0, ans=0.2 2023-11-18 04:06:15,869 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7550, loss[loss=0.1786, simple_loss=0.1849, pruned_loss=0.07283, audio_tagging_loss=0.01329, over 15935.00 frames. ], tot_loss[loss=0.1523, simple_loss=0.151, pruned_loss=0.0635, audio_tagging_loss=0.01328, over 3047232.39 frames. ], batch size: 60, lr: 3.78e-02, grad_scale: 32.0 2023-11-18 04:06:16,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=50333.333333333336, ans=0.125 2023-11-18 04:06:16,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2023-11-18 04:06:30,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=50400.0, ans=0.09899494936611666 2023-11-18 04:07:09,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=50600.0, ans=0.2 2023-11-18 04:07:12,580 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7600, loss[loss=0.1006, simple_loss=0.09616, pruned_loss=0.03854, audio_tagging_loss=0.01395, over 14859.00 frames. ], tot_loss[loss=0.1515, simple_loss=0.15, pruned_loss=0.06314, audio_tagging_loss=0.01336, over 3051941.24 frames. ], batch size: 58, lr: 3.77e-02, grad_scale: 32.0 2023-11-18 04:07:19,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.452e+01 1.053e+02 1.216e+02 1.364e+02 2.093e+02, threshold=2.431e+02, percent-clipped=0.0 2023-11-18 04:07:20,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-18 04:07:33,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50733.333333333336, ans=0.1 2023-11-18 04:07:52,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=50866.666666666664, ans=0.0 2023-11-18 04:07:52,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.30 vs. limit=6.0 2023-11-18 04:07:54,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=15.0 2023-11-18 04:07:57,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50933.333333333336, ans=0.1 2023-11-18 04:08:01,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=50933.333333333336, ans=0.09899494936611666 2023-11-18 04:08:05,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=15.0 2023-11-18 04:08:07,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=50933.333333333336, ans=0.125 2023-11-18 04:08:09,174 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7650, loss[loss=0.1645, simple_loss=0.1624, pruned_loss=0.06826, audio_tagging_loss=0.01509, over 15486.00 frames. ], tot_loss[loss=0.1505, simple_loss=0.1491, pruned_loss=0.06265, audio_tagging_loss=0.01336, over 3046404.15 frames. ], batch size: 57, lr: 3.77e-02, grad_scale: 32.0 2023-11-18 04:08:10,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=51000.0, ans=0.125 2023-11-18 04:08:14,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=51000.0, ans=0.2 2023-11-18 04:08:31,689 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:08:32,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=51133.333333333336, ans=0.125 2023-11-18 04:08:37,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=51133.333333333336, ans=0.2 2023-11-18 04:08:39,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=51133.333333333336, ans=0.0 2023-11-18 04:08:42,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=51200.0, ans=0.2 2023-11-18 04:08:47,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-18 04:08:49,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=51200.0, ans=0.0 2023-11-18 04:09:00,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=51266.666666666664, ans=22.5 2023-11-18 04:09:05,123 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7700, loss[loss=0.1649, simple_loss=0.1766, pruned_loss=0.06421, audio_tagging_loss=0.01244, over 15391.00 frames. ], tot_loss[loss=0.1507, simple_loss=0.1493, pruned_loss=0.06263, audio_tagging_loss=0.01343, over 3043491.17 frames. ], batch size: 55, lr: 3.76e-02, grad_scale: 32.0 2023-11-18 04:09:06,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=51333.333333333336, ans=0.125 2023-11-18 04:09:12,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 1.072e+02 1.285e+02 1.536e+02 2.038e+02, threshold=2.570e+02, percent-clipped=0.0 2023-11-18 04:09:15,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=51400.0, ans=0.125 2023-11-18 04:09:17,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=51400.0, ans=0.125 2023-11-18 04:09:28,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=51466.666666666664, ans=0.125 2023-11-18 04:09:53,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=51600.0, ans=0.2 2023-11-18 04:10:01,756 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7750, loss[loss=0.1581, simple_loss=0.1539, pruned_loss=0.07026, audio_tagging_loss=0.01091, over 14865.00 frames. ], tot_loss[loss=0.1519, simple_loss=0.1507, pruned_loss=0.06326, audio_tagging_loss=0.01335, over 3047124.65 frames. ], batch size: 57, lr: 3.75e-02, grad_scale: 64.0 2023-11-18 04:10:04,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=51666.666666666664, ans=0.125 2023-11-18 04:10:11,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=51666.666666666664, ans=0.0 2023-11-18 04:10:12,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=51733.333333333336, ans=0.1 2023-11-18 04:10:39,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=51866.666666666664, ans=0.125 2023-11-18 04:10:40,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=51866.666666666664, ans=0.125 2023-11-18 04:10:54,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=36.77 vs. limit=15.0 2023-11-18 04:10:58,486 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7800, loss[loss=0.2063, simple_loss=0.2037, pruned_loss=0.09286, audio_tagging_loss=0.01162, over 15344.00 frames. ], tot_loss[loss=0.1517, simple_loss=0.1507, pruned_loss=0.06299, audio_tagging_loss=0.01341, over 3050829.36 frames. ], batch size: 56, lr: 3.75e-02, grad_scale: 64.0 2023-11-18 04:11:01,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2023-11-18 04:11:04,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.657e+01 1.122e+02 1.272e+02 1.519e+02 2.538e+02, threshold=2.545e+02, percent-clipped=0.0 2023-11-18 04:11:09,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=52066.666666666664, ans=0.0 2023-11-18 04:11:12,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-11-18 04:11:13,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.91 vs. limit=15.0 2023-11-18 04:11:21,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52133.333333333336, ans=0.1 2023-11-18 04:11:54,967 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7850, loss[loss=0.1476, simple_loss=0.1521, pruned_loss=0.06, audio_tagging_loss=0.01153, over 15034.00 frames. ], tot_loss[loss=0.1519, simple_loss=0.1509, pruned_loss=0.06302, audio_tagging_loss=0.01339, over 3046595.72 frames. ], batch size: 56, lr: 3.74e-02, grad_scale: 64.0 2023-11-18 04:11:55,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=52333.333333333336, ans=0.125 2023-11-18 04:12:03,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52333.333333333336, ans=0.1 2023-11-18 04:12:34,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=52533.333333333336, ans=0.125 2023-11-18 04:12:43,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=52600.0, ans=0.0 2023-11-18 04:12:51,439 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7900, loss[loss=0.1544, simple_loss=0.147, pruned_loss=0.06373, audio_tagging_loss=0.01719, over 14940.00 frames. ], tot_loss[loss=0.1514, simple_loss=0.1504, pruned_loss=0.06263, audio_tagging_loss=0.01352, over 3052538.94 frames. ], batch size: 56, lr: 3.73e-02, grad_scale: 64.0 2023-11-18 04:12:51,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.62 vs. limit=10.0 2023-11-18 04:12:54,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52666.666666666664, ans=0.1 2023-11-18 04:12:58,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.654e+01 1.083e+02 1.346e+02 1.574e+02 2.605e+02, threshold=2.691e+02, percent-clipped=2.0 2023-11-18 04:13:09,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52733.333333333336, ans=0.1 2023-11-18 04:13:40,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=52933.333333333336, ans=0.125 2023-11-18 04:13:41,320 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:13:41,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=52933.333333333336, ans=0.0 2023-11-18 04:13:43,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=52933.333333333336, ans=0.0 2023-11-18 04:13:47,566 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 7950, loss[loss=0.1842, simple_loss=0.1907, pruned_loss=0.07945, audio_tagging_loss=0.009381, over 15373.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.1498, pruned_loss=0.06213, audio_tagging_loss=0.01373, over 3047821.37 frames. ], batch size: 55, lr: 3.73e-02, grad_scale: 64.0 2023-11-18 04:13:59,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=53066.666666666664, ans=0.125 2023-11-18 04:13:59,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=53066.666666666664, ans=0.09899494936611666 2023-11-18 04:14:00,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.83 vs. limit=10.0 2023-11-18 04:14:01,622 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:14:18,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2023-11-18 04:14:22,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=53200.0, ans=0.125 2023-11-18 04:14:24,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2023-11-18 04:14:35,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=53266.666666666664, ans=0.2 2023-11-18 04:14:45,282 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8000, loss[loss=0.1338, simple_loss=0.1317, pruned_loss=0.05589, audio_tagging_loss=0.0121, over 14823.00 frames. ], tot_loss[loss=0.1488, simple_loss=0.1479, pruned_loss=0.06113, audio_tagging_loss=0.01371, over 3041980.69 frames. ], batch size: 55, lr: 3.72e-02, grad_scale: 64.0 2023-11-18 04:14:52,304 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.064e+02 1.192e+02 1.330e+02 2.518e+02, threshold=2.384e+02, percent-clipped=0.0 2023-11-18 04:14:56,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=53400.0, ans=0.0 2023-11-18 04:15:18,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.67 vs. limit=15.0 2023-11-18 04:15:21,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53533.333333333336, ans=0.1 2023-11-18 04:15:26,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=53533.333333333336, ans=0.0 2023-11-18 04:15:41,084 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8050, loss[loss=0.1558, simple_loss=0.1544, pruned_loss=0.06663, audio_tagging_loss=0.012, over 14121.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.1483, pruned_loss=0.06138, audio_tagging_loss=0.01378, over 3037537.71 frames. ], batch size: 55, lr: 3.72e-02, grad_scale: 64.0 2023-11-18 04:15:44,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=53666.666666666664, ans=0.125 2023-11-18 04:15:47,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.57 vs. limit=15.0 2023-11-18 04:15:50,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53666.666666666664, ans=0.1 2023-11-18 04:15:52,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2023-11-18 04:15:56,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=53733.333333333336, ans=0.125 2023-11-18 04:16:22,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.07 vs. limit=15.0 2023-11-18 04:16:25,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=53933.333333333336, ans=0.125 2023-11-18 04:16:37,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2023-11-18 04:16:37,487 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8100, loss[loss=0.144, simple_loss=0.1404, pruned_loss=0.06085, audio_tagging_loss=0.01296, over 14527.00 frames. ], tot_loss[loss=0.1496, simple_loss=0.149, pruned_loss=0.06163, audio_tagging_loss=0.0135, over 3036362.80 frames. ], batch size: 56, lr: 3.71e-02, grad_scale: 64.0 2023-11-18 04:16:42,554 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2023-11-18 04:16:43,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 1.055e+02 1.175e+02 1.442e+02 1.996e+02, threshold=2.349e+02, percent-clipped=0.0 2023-11-18 04:17:01,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=54133.333333333336, ans=0.09899494936611666 2023-11-18 04:17:04,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.15 vs. limit=22.5 2023-11-18 04:17:07,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=54133.333333333336, ans=0.125 2023-11-18 04:17:11,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=54200.0, ans=0.125 2023-11-18 04:17:24,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54266.666666666664, ans=0.1 2023-11-18 04:17:32,889 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8150, loss[loss=0.1212, simple_loss=0.1159, pruned_loss=0.0476, audio_tagging_loss=0.01567, over 15579.00 frames. ], tot_loss[loss=0.1509, simple_loss=0.1504, pruned_loss=0.06234, audio_tagging_loss=0.01332, over 3041764.14 frames. ], batch size: 60, lr: 3.70e-02, grad_scale: 64.0 2023-11-18 04:18:29,070 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8200, loss[loss=0.1389, simple_loss=0.1356, pruned_loss=0.05754, audio_tagging_loss=0.01361, over 16300.00 frames. ], tot_loss[loss=0.1513, simple_loss=0.1511, pruned_loss=0.06261, audio_tagging_loss=0.0131, over 3051692.43 frames. ], batch size: 64, lr: 3.70e-02, grad_scale: 32.0 2023-11-18 04:18:30,780 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:18:34,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=54666.666666666664, ans=0.125 2023-11-18 04:18:37,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.282e+01 1.076e+02 1.233e+02 1.443e+02 5.591e+02, threshold=2.467e+02, percent-clipped=1.0 2023-11-18 04:18:37,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=54666.666666666664, ans=0.5 2023-11-18 04:18:38,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2023-11-18 04:18:54,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=54800.0, ans=0.035 2023-11-18 04:19:00,302 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.981e+00 2023-11-18 04:19:04,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54866.666666666664, ans=0.1 2023-11-18 04:19:11,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=54866.666666666664, ans=0.04949747468305833 2023-11-18 04:19:19,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=54933.333333333336, ans=0.125 2023-11-18 04:19:19,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=54933.333333333336, ans=0.125 2023-11-18 04:19:25,764 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8250, loss[loss=0.1403, simple_loss=0.1446, pruned_loss=0.0545, audio_tagging_loss=0.01356, over 16524.00 frames. ], tot_loss[loss=0.1519, simple_loss=0.1515, pruned_loss=0.06293, audio_tagging_loss=0.01319, over 3053071.89 frames. ], batch size: 63, lr: 3.69e-02, grad_scale: 32.0 2023-11-18 04:19:30,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=55000.0, ans=0.04949747468305833 2023-11-18 04:19:31,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=55000.0, ans=0.04949747468305833 2023-11-18 04:19:32,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=55000.0, ans=0.125 2023-11-18 04:20:03,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=55200.0, ans=0.0 2023-11-18 04:20:20,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=55333.333333333336, ans=0.125 2023-11-18 04:20:21,386 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8300, loss[loss=0.1328, simple_loss=0.1337, pruned_loss=0.0497, audio_tagging_loss=0.01628, over 14504.00 frames. ], tot_loss[loss=0.151, simple_loss=0.1504, pruned_loss=0.06261, audio_tagging_loss=0.01317, over 3056470.10 frames. ], batch size: 53, lr: 3.68e-02, grad_scale: 32.0 2023-11-18 04:20:26,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=55333.333333333336, ans=0.125 2023-11-18 04:20:28,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.605e+01 1.079e+02 1.222e+02 1.465e+02 2.413e+02, threshold=2.444e+02, percent-clipped=0.0 2023-11-18 04:20:28,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=55333.333333333336, ans=0.2 2023-11-18 04:20:33,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=55400.0, ans=0.015 2023-11-18 04:20:36,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=55400.0, ans=0.125 2023-11-18 04:21:06,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=55600.0, ans=0.0 2023-11-18 04:21:13,292 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:21:17,287 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8350, loss[loss=0.1205, simple_loss=0.1263, pruned_loss=0.04796, audio_tagging_loss=0.009326, over 14961.00 frames. ], tot_loss[loss=0.1499, simple_loss=0.1493, pruned_loss=0.06213, audio_tagging_loss=0.01315, over 3055228.54 frames. ], batch size: 57, lr: 3.68e-02, grad_scale: 32.0 2023-11-18 04:21:19,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=55666.666666666664, ans=0.125 2023-11-18 04:21:44,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=55800.0, ans=0.0 2023-11-18 04:21:51,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=55866.666666666664, ans=0.0 2023-11-18 04:21:52,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=55866.666666666664, ans=0.125 2023-11-18 04:22:07,036 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:22:14,450 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8400, loss[loss=0.1499, simple_loss=0.1506, pruned_loss=0.06508, audio_tagging_loss=0.009508, over 15191.00 frames. ], tot_loss[loss=0.149, simple_loss=0.1486, pruned_loss=0.06146, audio_tagging_loss=0.01318, over 3055950.76 frames. ], batch size: 57, lr: 3.67e-02, grad_scale: 32.0 2023-11-18 04:22:15,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2023-11-18 04:22:21,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=56000.0, ans=0.125 2023-11-18 04:22:21,885 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.450e+01 1.072e+02 1.183e+02 1.364e+02 2.045e+02, threshold=2.367e+02, percent-clipped=0.0 2023-11-18 04:22:22,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.66 vs. limit=15.0 2023-11-18 04:22:23,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=56000.0, ans=0.0 2023-11-18 04:22:33,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=56066.666666666664, ans=0.125 2023-11-18 04:22:57,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=56200.0, ans=0.125 2023-11-18 04:23:02,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=56266.666666666664, ans=0.125 2023-11-18 04:23:09,887 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8450, loss[loss=0.1519, simple_loss=0.1488, pruned_loss=0.06257, audio_tagging_loss=0.01494, over 14851.00 frames. ], tot_loss[loss=0.1491, simple_loss=0.1483, pruned_loss=0.06152, audio_tagging_loss=0.01337, over 3051576.55 frames. ], batch size: 56, lr: 3.67e-02, grad_scale: 32.0 2023-11-18 04:23:23,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2023-11-18 04:23:29,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56400.0, ans=0.1 2023-11-18 04:23:45,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=56533.333333333336, ans=0.125 2023-11-18 04:24:05,591 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8500, loss[loss=0.1874, simple_loss=0.1914, pruned_loss=0.08221, audio_tagging_loss=0.009511, over 15113.00 frames. ], tot_loss[loss=0.1485, simple_loss=0.148, pruned_loss=0.06113, audio_tagging_loss=0.01342, over 3052987.01 frames. ], batch size: 56, lr: 3.66e-02, grad_scale: 32.0 2023-11-18 04:24:09,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=56666.666666666664, ans=0.0 2023-11-18 04:24:13,495 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 1.081e+02 1.253e+02 1.521e+02 2.592e+02, threshold=2.506e+02, percent-clipped=2.0 2023-11-18 04:24:22,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=56733.333333333336, ans=0.015 2023-11-18 04:24:24,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=56733.333333333336, ans=0.2 2023-11-18 04:24:31,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=56800.0, ans=0.125 2023-11-18 04:24:32,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=56800.0, ans=0.125 2023-11-18 04:24:58,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=56933.333333333336, ans=0.125 2023-11-18 04:25:02,319 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8550, loss[loss=0.1407, simple_loss=0.1417, pruned_loss=0.05391, audio_tagging_loss=0.01593, over 15842.00 frames. ], tot_loss[loss=0.148, simple_loss=0.1478, pruned_loss=0.06058, audio_tagging_loss=0.01349, over 3054237.28 frames. ], batch size: 59, lr: 3.65e-02, grad_scale: 32.0 2023-11-18 04:25:39,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=57200.0, ans=0.0 2023-11-18 04:25:52,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=57266.666666666664, ans=0.05 2023-11-18 04:25:57,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=57333.333333333336, ans=0.125 2023-11-18 04:25:58,794 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8600, loss[loss=0.1225, simple_loss=0.1208, pruned_loss=0.04992, audio_tagging_loss=0.01219, over 16406.00 frames. ], tot_loss[loss=0.1481, simple_loss=0.148, pruned_loss=0.06068, audio_tagging_loss=0.01342, over 3057626.84 frames. ], batch size: 63, lr: 3.65e-02, grad_scale: 32.0 2023-11-18 04:26:06,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.067e+01 1.036e+02 1.166e+02 1.373e+02 2.331e+02, threshold=2.332e+02, percent-clipped=0.0 2023-11-18 04:26:31,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=57533.333333333336, ans=0.0 2023-11-18 04:26:35,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=57533.333333333336, ans=15.0 2023-11-18 04:26:55,046 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8650, loss[loss=0.1498, simple_loss=0.1341, pruned_loss=0.06507, audio_tagging_loss=0.01768, over 14723.00 frames. ], tot_loss[loss=0.1488, simple_loss=0.1488, pruned_loss=0.06081, audio_tagging_loss=0.01359, over 3059130.61 frames. ], batch size: 58, lr: 3.64e-02, grad_scale: 32.0 2023-11-18 04:27:11,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=57733.333333333336, ans=0.0 2023-11-18 04:27:43,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=57933.333333333336, ans=0.125 2023-11-18 04:27:51,161 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8700, loss[loss=0.1369, simple_loss=0.1419, pruned_loss=0.05428, audio_tagging_loss=0.01169, over 15566.00 frames. ], tot_loss[loss=0.1496, simple_loss=0.1496, pruned_loss=0.0613, audio_tagging_loss=0.01348, over 3056632.58 frames. ], batch size: 58, lr: 3.64e-02, grad_scale: 32.0 2023-11-18 04:27:59,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 1.148e+02 1.309e+02 1.555e+02 2.620e+02, threshold=2.618e+02, percent-clipped=1.0 2023-11-18 04:28:00,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=58000.0, ans=0.2 2023-11-18 04:28:00,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=58000.0, ans=0.125 2023-11-18 04:28:03,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=58066.666666666664, ans=0.0 2023-11-18 04:28:05,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.90 vs. limit=22.5 2023-11-18 04:28:20,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=58133.333333333336, ans=0.125 2023-11-18 04:28:27,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=58200.0, ans=0.125 2023-11-18 04:28:47,655 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8750, loss[loss=0.1633, simple_loss=0.1691, pruned_loss=0.06683, audio_tagging_loss=0.0119, over 14767.00 frames. ], tot_loss[loss=0.1498, simple_loss=0.15, pruned_loss=0.06117, audio_tagging_loss=0.0136, over 3055142.82 frames. ], batch size: 54, lr: 3.63e-02, grad_scale: 32.0 2023-11-18 04:28:50,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=58333.333333333336, ans=0.125 2023-11-18 04:28:58,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.57 vs. limit=15.0 2023-11-18 04:28:59,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=58400.0, ans=0.0 2023-11-18 04:29:40,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=12.0 2023-11-18 04:29:42,991 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8800, loss[loss=0.1744, simple_loss=0.1844, pruned_loss=0.07066, audio_tagging_loss=0.01152, over 15392.00 frames. ], tot_loss[loss=0.1514, simple_loss=0.1517, pruned_loss=0.06186, audio_tagging_loss=0.01369, over 3050521.56 frames. ], batch size: 56, lr: 3.62e-02, grad_scale: 32.0 2023-11-18 04:29:45,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=58666.666666666664, ans=0.0 2023-11-18 04:29:50,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.076e+01 1.175e+02 1.354e+02 1.562e+02 2.721e+02, threshold=2.708e+02, percent-clipped=1.0 2023-11-18 04:29:54,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.06 vs. limit=15.0 2023-11-18 04:30:39,316 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8850, loss[loss=0.1154, simple_loss=0.1122, pruned_loss=0.04579, audio_tagging_loss=0.01347, over 14974.00 frames. ], tot_loss[loss=0.1507, simple_loss=0.151, pruned_loss=0.06149, audio_tagging_loss=0.01366, over 3046077.24 frames. ], batch size: 58, lr: 3.62e-02, grad_scale: 32.0 2023-11-18 04:30:43,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=59000.0, ans=0.125 2023-11-18 04:30:43,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=59000.0, ans=0.0 2023-11-18 04:30:47,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59000.0, ans=0.1 2023-11-18 04:30:48,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=59000.0, ans=0.125 2023-11-18 04:30:51,661 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:30:57,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.77 vs. limit=10.0 2023-11-18 04:31:19,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59200.0, ans=0.1 2023-11-18 04:31:35,261 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8900, loss[loss=0.1651, simple_loss=0.1744, pruned_loss=0.0666, audio_tagging_loss=0.0113, over 15666.00 frames. ], tot_loss[loss=0.1506, simple_loss=0.151, pruned_loss=0.0615, audio_tagging_loss=0.01361, over 3043749.37 frames. ], batch size: 58, lr: 3.61e-02, grad_scale: 32.0 2023-11-18 04:31:43,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.181e+01 1.040e+02 1.138e+02 1.318e+02 1.926e+02, threshold=2.277e+02, percent-clipped=0.0 2023-11-18 04:32:01,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=12.0 2023-11-18 04:32:03,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=59466.666666666664, ans=0.2 2023-11-18 04:32:14,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59533.333333333336, ans=0.1 2023-11-18 04:32:26,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=59600.0, ans=0.125 2023-11-18 04:32:26,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=59600.0, ans=0.0 2023-11-18 04:32:30,889 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 8950, loss[loss=0.134, simple_loss=0.1283, pruned_loss=0.05688, audio_tagging_loss=0.01298, over 13477.00 frames. ], tot_loss[loss=0.1492, simple_loss=0.1497, pruned_loss=0.06101, audio_tagging_loss=0.01331, over 3048674.46 frames. ], batch size: 54, lr: 3.61e-02, grad_scale: 32.0 2023-11-18 04:32:51,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=59733.333333333336, ans=0.125 2023-11-18 04:32:52,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=12.0 2023-11-18 04:32:56,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-11-18 04:32:56,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=59800.0, ans=0.125 2023-11-18 04:33:04,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2023-11-18 04:33:08,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=59866.666666666664, ans=0.0 2023-11-18 04:33:14,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=59933.333333333336, ans=0.125 2023-11-18 04:33:16,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=59933.333333333336, ans=0.125 2023-11-18 04:33:27,437 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9000, loss[loss=0.1874, simple_loss=0.1929, pruned_loss=0.07791, audio_tagging_loss=0.01306, over 15406.00 frames. ], tot_loss[loss=0.1488, simple_loss=0.1493, pruned_loss=0.06085, audio_tagging_loss=0.01329, over 3045322.23 frames. ], batch size: 56, lr: 3.60e-02, grad_scale: 32.0 2023-11-18 04:33:27,437 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 04:33:58,285 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8949, 1.2107, 1.6955, 1.2605, 0.9857, 2.1065, 1.9183, 1.8034], device='cuda:3') 2023-11-18 04:34:01,191 INFO [train_asr.py:1147] (3/4) Epoch 1, validation: loss=0.0967, simple_loss=0.07481, pruned_loss=0.01931, audio_tagging_loss=0.03999, over 4681554.00 frames. 2023-11-18 04:34:01,192 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 04:34:02,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=15.0 2023-11-18 04:34:09,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 1.047e+02 1.193e+02 1.407e+02 2.407e+02, threshold=2.385e+02, percent-clipped=1.0 2023-11-18 04:34:16,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=60066.666666666664, ans=0.125 2023-11-18 04:34:25,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=60133.333333333336, ans=0.0 2023-11-18 04:34:26,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-11-18 04:34:26,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.48 vs. limit=22.5 2023-11-18 04:34:34,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=60200.0, ans=0.04949747468305833 2023-11-18 04:34:57,627 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9050, loss[loss=0.1967, simple_loss=0.2002, pruned_loss=0.08891, audio_tagging_loss=0.00768, over 15527.00 frames. ], tot_loss[loss=0.1492, simple_loss=0.1499, pruned_loss=0.06104, audio_tagging_loss=0.01324, over 3039389.60 frames. ], batch size: 57, lr: 3.59e-02, grad_scale: 32.0 2023-11-18 04:35:03,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=60333.333333333336, ans=0.1 2023-11-18 04:35:06,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=12.0 2023-11-18 04:35:32,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=60533.333333333336, ans=0.0 2023-11-18 04:35:33,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=60533.333333333336, ans=0.0 2023-11-18 04:35:40,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=60600.0, ans=0.125 2023-11-18 04:35:50,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=60600.0, ans=0.125 2023-11-18 04:35:50,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=60600.0, ans=0.125 2023-11-18 04:35:52,804 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9100, loss[loss=0.1746, simple_loss=0.1821, pruned_loss=0.07041, audio_tagging_loss=0.0131, over 15419.00 frames. ], tot_loss[loss=0.1488, simple_loss=0.1499, pruned_loss=0.06077, audio_tagging_loss=0.0131, over 3046168.96 frames. ], batch size: 56, lr: 3.59e-02, grad_scale: 32.0 2023-11-18 04:36:00,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.380e+01 1.098e+02 1.291e+02 1.456e+02 2.208e+02, threshold=2.583e+02, percent-clipped=0.0 2023-11-18 04:36:01,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=60666.666666666664, ans=0.0 2023-11-18 04:36:06,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=60733.333333333336, ans=0.0 2023-11-18 04:36:15,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=60800.0, ans=0.04949747468305833 2023-11-18 04:36:37,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=60933.333333333336, ans=0.125 2023-11-18 04:36:48,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=61000.0, ans=0.125 2023-11-18 04:36:48,977 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9150, loss[loss=0.1324, simple_loss=0.1328, pruned_loss=0.05055, audio_tagging_loss=0.01541, over 14629.00 frames. ], tot_loss[loss=0.1475, simple_loss=0.1486, pruned_loss=0.06017, audio_tagging_loss=0.01302, over 3038027.58 frames. ], batch size: 55, lr: 3.58e-02, grad_scale: 32.0 2023-11-18 04:37:44,913 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9200, loss[loss=0.132, simple_loss=0.136, pruned_loss=0.04793, audio_tagging_loss=0.01609, over 14147.00 frames. ], tot_loss[loss=0.1473, simple_loss=0.1483, pruned_loss=0.06005, audio_tagging_loss=0.01312, over 3037852.45 frames. ], batch size: 53, lr: 3.58e-02, grad_scale: 32.0 2023-11-18 04:37:46,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=61333.333333333336, ans=0.2 2023-11-18 04:37:46,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=61333.333333333336, ans=0.0 2023-11-18 04:37:50,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=61333.333333333336, ans=0.125 2023-11-18 04:37:52,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.523e+01 1.127e+02 1.318e+02 1.536e+02 2.303e+02, threshold=2.636e+02, percent-clipped=0.0 2023-11-18 04:37:53,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=61333.333333333336, ans=0.1 2023-11-18 04:37:55,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=61400.0, ans=0.0 2023-11-18 04:38:07,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=61466.666666666664, ans=0.95 2023-11-18 04:38:26,288 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:38:34,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=61600.0, ans=0.125 2023-11-18 04:38:36,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.19 vs. limit=22.5 2023-11-18 04:38:36,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2023-11-18 04:38:41,845 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9250, loss[loss=0.1476, simple_loss=0.153, pruned_loss=0.0616, audio_tagging_loss=0.009509, over 15660.00 frames. ], tot_loss[loss=0.1479, simple_loss=0.1489, pruned_loss=0.06041, audio_tagging_loss=0.01304, over 3047985.47 frames. ], batch size: 58, lr: 3.57e-02, grad_scale: 32.0 2023-11-18 04:38:46,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2023-11-18 04:38:53,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=61733.333333333336, ans=0.015 2023-11-18 04:38:59,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.94 vs. limit=15.0 2023-11-18 04:39:03,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=61800.0, ans=0.125 2023-11-18 04:39:15,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=61866.666666666664, ans=0.125 2023-11-18 04:39:23,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=61866.666666666664, ans=0.125 2023-11-18 04:39:24,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=61866.666666666664, ans=0.125 2023-11-18 04:39:37,695 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9300, loss[loss=0.1014, simple_loss=0.1014, pruned_loss=0.03406, audio_tagging_loss=0.01663, over 15317.00 frames. ], tot_loss[loss=0.1466, simple_loss=0.1473, pruned_loss=0.05976, audio_tagging_loss=0.01313, over 3043823.55 frames. ], batch size: 59, lr: 3.57e-02, grad_scale: 32.0 2023-11-18 04:39:45,024 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.224e+01 1.082e+02 1.160e+02 1.352e+02 1.912e+02, threshold=2.319e+02, percent-clipped=0.0 2023-11-18 04:39:57,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=62066.666666666664, ans=0.2 2023-11-18 04:40:07,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=25.49 vs. limit=22.5 2023-11-18 04:40:17,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=62200.0, ans=0.0 2023-11-18 04:40:27,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=62266.666666666664, ans=0.0 2023-11-18 04:40:31,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=62333.333333333336, ans=0.125 2023-11-18 04:40:32,974 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9350, loss[loss=0.1735, simple_loss=0.1762, pruned_loss=0.07263, audio_tagging_loss=0.01273, over 14785.00 frames. ], tot_loss[loss=0.1464, simple_loss=0.1471, pruned_loss=0.05968, audio_tagging_loss=0.01319, over 3046372.33 frames. ], batch size: 55, lr: 3.56e-02, grad_scale: 32.0 2023-11-18 04:40:37,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=62333.333333333336, ans=0.0 2023-11-18 04:40:41,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=62333.333333333336, ans=0.125 2023-11-18 04:40:49,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.66 vs. limit=22.5 2023-11-18 04:40:50,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=62400.0, ans=0.2 2023-11-18 04:40:58,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=62466.666666666664, ans=0.125 2023-11-18 04:41:01,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=62466.666666666664, ans=0.0 2023-11-18 04:41:08,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=62533.333333333336, ans=0.125 2023-11-18 04:41:09,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.18 vs. limit=22.5 2023-11-18 04:41:29,988 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9400, loss[loss=0.1816, simple_loss=0.175, pruned_loss=0.08031, audio_tagging_loss=0.01384, over 14659.00 frames. ], tot_loss[loss=0.1462, simple_loss=0.1468, pruned_loss=0.05947, audio_tagging_loss=0.01337, over 3042781.31 frames. ], batch size: 56, lr: 3.55e-02, grad_scale: 32.0 2023-11-18 04:41:37,999 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.990e+01 1.022e+02 1.168e+02 1.353e+02 2.252e+02, threshold=2.336e+02, percent-clipped=0.0 2023-11-18 04:41:48,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=62733.333333333336, ans=0.07 2023-11-18 04:41:49,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=62733.333333333336, ans=0.125 2023-11-18 04:41:55,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=62800.0, ans=0.125 2023-11-18 04:42:25,906 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9450, loss[loss=0.1535, simple_loss=0.1519, pruned_loss=0.06501, audio_tagging_loss=0.01254, over 14374.00 frames. ], tot_loss[loss=0.1471, simple_loss=0.1475, pruned_loss=0.05994, audio_tagging_loss=0.01338, over 3046202.25 frames. ], batch size: 55, lr: 3.55e-02, grad_scale: 32.0 2023-11-18 04:42:25,919 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:42:30,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=63000.0, ans=0.1 2023-11-18 04:42:37,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=63066.666666666664, ans=0.0 2023-11-18 04:42:44,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=63066.666666666664, ans=0.0 2023-11-18 04:43:01,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.07 vs. limit=22.5 2023-11-18 04:43:11,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=63266.666666666664, ans=0.1 2023-11-18 04:43:11,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=63266.666666666664, ans=0.0 2023-11-18 04:43:21,288 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9500, loss[loss=0.1823, simple_loss=0.1872, pruned_loss=0.07582, audio_tagging_loss=0.01287, over 15938.00 frames. ], tot_loss[loss=0.1476, simple_loss=0.1484, pruned_loss=0.05997, audio_tagging_loss=0.01338, over 3056226.15 frames. ], batch size: 57, lr: 3.54e-02, grad_scale: 32.0 2023-11-18 04:43:29,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 1.117e+02 1.291e+02 1.412e+02 2.358e+02, threshold=2.583e+02, percent-clipped=1.0 2023-11-18 04:43:30,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=63333.333333333336, ans=0.125 2023-11-18 04:43:35,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=15.0 2023-11-18 04:43:44,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=63466.666666666664, ans=0.1 2023-11-18 04:43:44,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2023-11-18 04:43:44,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=63466.666666666664, ans=0.125 2023-11-18 04:43:56,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=63533.333333333336, ans=0.025 2023-11-18 04:43:59,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=63533.333333333336, ans=0.05 2023-11-18 04:44:03,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63533.333333333336, ans=0.1 2023-11-18 04:44:04,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63533.333333333336, ans=0.1 2023-11-18 04:44:04,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=63533.333333333336, ans=0.125 2023-11-18 04:44:04,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.13 vs. limit=15.0 2023-11-18 04:44:12,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=63600.0, ans=0.2 2023-11-18 04:44:14,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=63600.0, ans=0.2 2023-11-18 04:44:17,716 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9550, loss[loss=0.2155, simple_loss=0.2199, pruned_loss=0.09095, audio_tagging_loss=0.01466, over 15085.00 frames. ], tot_loss[loss=0.1475, simple_loss=0.148, pruned_loss=0.05996, audio_tagging_loss=0.0135, over 3050325.76 frames. ], batch size: 56, lr: 3.54e-02, grad_scale: 32.0 2023-11-18 04:44:19,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-11-18 04:44:22,737 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=12.0 2023-11-18 04:44:28,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=63733.333333333336, ans=0.125 2023-11-18 04:44:29,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=63733.333333333336, ans=0.125 2023-11-18 04:44:34,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=63733.333333333336, ans=0.5 2023-11-18 04:44:40,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=63800.0, ans=0.125 2023-11-18 04:44:44,519 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:44:54,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=63866.666666666664, ans=0.0 2023-11-18 04:45:14,636 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9600, loss[loss=0.1827, simple_loss=0.1968, pruned_loss=0.07618, audio_tagging_loss=0.00814, over 16095.00 frames. ], tot_loss[loss=0.1479, simple_loss=0.1484, pruned_loss=0.06003, audio_tagging_loss=0.01364, over 3054521.38 frames. ], batch size: 59, lr: 3.53e-02, grad_scale: 32.0 2023-11-18 04:45:21,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=64000.0, ans=0.04949747468305833 2023-11-18 04:45:22,005 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 1.076e+02 1.212e+02 1.383e+02 1.987e+02, threshold=2.424e+02, percent-clipped=0.0 2023-11-18 04:45:22,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=64000.0, ans=0.0 2023-11-18 04:45:34,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2023-11-18 04:45:55,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64200.0, ans=0.1 2023-11-18 04:45:55,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=64200.0, ans=0.125 2023-11-18 04:46:09,784 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9650, loss[loss=0.1193, simple_loss=0.1164, pruned_loss=0.04797, audio_tagging_loss=0.01307, over 14908.00 frames. ], tot_loss[loss=0.1476, simple_loss=0.148, pruned_loss=0.06001, audio_tagging_loss=0.01356, over 3055493.60 frames. ], batch size: 58, lr: 3.53e-02, grad_scale: 32.0 2023-11-18 04:46:10,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=64333.333333333336, ans=0.125 2023-11-18 04:46:21,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=64400.0, ans=0.0 2023-11-18 04:46:25,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=64400.0, ans=0.125 2023-11-18 04:46:40,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=64466.666666666664, ans=0.0 2023-11-18 04:46:52,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-11-18 04:46:57,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2023-11-18 04:47:06,186 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9700, loss[loss=0.1479, simple_loss=0.1535, pruned_loss=0.05934, audio_tagging_loss=0.01187, over 16043.00 frames. ], tot_loss[loss=0.1464, simple_loss=0.1473, pruned_loss=0.0594, audio_tagging_loss=0.0133, over 3050894.91 frames. ], batch size: 58, lr: 3.52e-02, grad_scale: 32.0 2023-11-18 04:47:07,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=64666.666666666664, ans=0.125 2023-11-18 04:47:13,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 1.077e+02 1.252e+02 1.398e+02 2.198e+02, threshold=2.504e+02, percent-clipped=0.0 2023-11-18 04:47:27,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=64800.0, ans=0.125 2023-11-18 04:47:41,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=64866.666666666664, ans=0.0 2023-11-18 04:47:48,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=64866.666666666664, ans=0.2 2023-11-18 04:48:01,905 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9750, loss[loss=0.1194, simple_loss=0.1254, pruned_loss=0.04562, audio_tagging_loss=0.01105, over 15109.00 frames. ], tot_loss[loss=0.1484, simple_loss=0.1495, pruned_loss=0.06056, audio_tagging_loss=0.01305, over 3054670.70 frames. ], batch size: 57, lr: 3.51e-02, grad_scale: 32.0 2023-11-18 04:48:08,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65000.0, ans=0.1 2023-11-18 04:48:33,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=65133.333333333336, ans=0.125 2023-11-18 04:48:37,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.10 vs. limit=15.0 2023-11-18 04:48:46,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=65266.666666666664, ans=15.0 2023-11-18 04:48:58,257 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9800, loss[loss=0.1229, simple_loss=0.1155, pruned_loss=0.04988, audio_tagging_loss=0.01525, over 15694.00 frames. ], tot_loss[loss=0.1474, simple_loss=0.1486, pruned_loss=0.06006, audio_tagging_loss=0.01306, over 3044738.79 frames. ], batch size: 62, lr: 3.51e-02, grad_scale: 32.0 2023-11-18 04:49:02,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=65333.333333333336, ans=0.125 2023-11-18 04:49:05,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=65333.333333333336, ans=0.2 2023-11-18 04:49:05,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=65333.333333333336, ans=0.125 2023-11-18 04:49:06,101 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.876e+01 1.068e+02 1.214e+02 1.427e+02 2.483e+02, threshold=2.428e+02, percent-clipped=0.0 2023-11-18 04:49:09,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=65400.0, ans=0.125 2023-11-18 04:49:09,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=65400.0, ans=0.07 2023-11-18 04:49:15,320 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:49:30,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.77 vs. limit=22.5 2023-11-18 04:49:48,831 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:49:53,553 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-11-18 04:49:54,134 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9850, loss[loss=0.114, simple_loss=0.1107, pruned_loss=0.04437, audio_tagging_loss=0.01431, over 15272.00 frames. ], tot_loss[loss=0.1466, simple_loss=0.1478, pruned_loss=0.05964, audio_tagging_loss=0.01307, over 3045945.19 frames. ], batch size: 58, lr: 3.50e-02, grad_scale: 32.0 2023-11-18 04:50:07,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-18 04:50:11,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=65733.33333333333, ans=0.125 2023-11-18 04:50:15,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=65800.0, ans=0.95 2023-11-18 04:50:31,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=65866.66666666667, ans=0.2 2023-11-18 04:50:33,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=65866.66666666667, ans=0.125 2023-11-18 04:50:50,828 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9900, loss[loss=0.1473, simple_loss=0.1544, pruned_loss=0.05792, audio_tagging_loss=0.0122, over 15261.00 frames. ], tot_loss[loss=0.1453, simple_loss=0.1464, pruned_loss=0.05901, audio_tagging_loss=0.01313, over 3037612.56 frames. ], batch size: 55, lr: 3.50e-02, grad_scale: 32.0 2023-11-18 04:50:58,263 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.989e+01 1.084e+02 1.192e+02 1.374e+02 2.032e+02, threshold=2.383e+02, percent-clipped=0.0 2023-11-18 04:51:16,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.97 vs. limit=22.5 2023-11-18 04:51:24,930 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-11-18 04:51:38,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=12.0 2023-11-18 04:51:46,602 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 9950, loss[loss=0.1203, simple_loss=0.1157, pruned_loss=0.05035, audio_tagging_loss=0.01216, over 15027.00 frames. ], tot_loss[loss=0.1443, simple_loss=0.1455, pruned_loss=0.05831, audio_tagging_loss=0.01326, over 3036779.64 frames. ], batch size: 58, lr: 3.49e-02, grad_scale: 32.0 2023-11-18 04:51:57,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=66400.0, ans=0.2 2023-11-18 04:52:00,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=66400.0, ans=0.0 2023-11-18 04:52:07,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.71 vs. limit=22.5 2023-11-18 04:52:09,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=66466.66666666667, ans=0.1 2023-11-18 04:52:24,998 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-18 04:52:28,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66533.33333333333, ans=0.1 2023-11-18 04:52:43,118 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10000, loss[loss=0.1182, simple_loss=0.1193, pruned_loss=0.04423, audio_tagging_loss=0.01436, over 15043.00 frames. ], tot_loss[loss=0.1443, simple_loss=0.1455, pruned_loss=0.05831, audio_tagging_loss=0.01324, over 3040914.23 frames. ], batch size: 56, lr: 3.49e-02, grad_scale: 32.0 2023-11-18 04:52:50,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.963e+01 1.074e+02 1.249e+02 1.429e+02 2.064e+02, threshold=2.499e+02, percent-clipped=0.0 2023-11-18 04:52:55,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=66733.33333333333, ans=0.2 2023-11-18 04:53:13,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=66800.0, ans=0.1 2023-11-18 04:53:16,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=66866.66666666667, ans=0.1 2023-11-18 04:53:23,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=66866.66666666667, ans=0.1 2023-11-18 04:53:35,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=66933.33333333333, ans=0.0 2023-11-18 04:53:39,077 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10050, loss[loss=0.1569, simple_loss=0.1633, pruned_loss=0.05972, audio_tagging_loss=0.01549, over 14804.00 frames. ], tot_loss[loss=0.145, simple_loss=0.1465, pruned_loss=0.05847, audio_tagging_loss=0.01323, over 3041631.32 frames. ], batch size: 56, lr: 3.48e-02, grad_scale: 32.0 2023-11-18 04:53:42,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2023-11-18 04:53:46,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=67000.0, ans=0.0 2023-11-18 04:54:00,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=67133.33333333333, ans=0.125 2023-11-18 04:54:08,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=67133.33333333333, ans=0.125 2023-11-18 04:54:21,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.87 vs. limit=22.5 2023-11-18 04:54:24,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=67266.66666666667, ans=0.125 2023-11-18 04:54:34,772 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10100, loss[loss=0.1583, simple_loss=0.1538, pruned_loss=0.06532, audio_tagging_loss=0.01606, over 14900.00 frames. ], tot_loss[loss=0.145, simple_loss=0.1463, pruned_loss=0.05855, audio_tagging_loss=0.01326, over 3044012.49 frames. ], batch size: 54, lr: 3.47e-02, grad_scale: 32.0 2023-11-18 04:54:39,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=67333.33333333333, ans=0.125 2023-11-18 04:54:39,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=67333.33333333333, ans=0.125 2023-11-18 04:54:42,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.070e+02 1.242e+02 1.409e+02 2.518e+02, threshold=2.485e+02, percent-clipped=1.0 2023-11-18 04:54:52,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=67400.0, ans=0.125 2023-11-18 04:54:56,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.45 vs. limit=15.0 2023-11-18 04:55:10,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2023-11-18 04:55:19,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=67600.0, ans=0.125 2023-11-18 04:55:20,753 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:55:31,506 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10150, loss[loss=0.1354, simple_loss=0.1236, pruned_loss=0.0557, audio_tagging_loss=0.01785, over 16061.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.1446, pruned_loss=0.05785, audio_tagging_loss=0.01353, over 3047967.30 frames. ], batch size: 65, lr: 3.47e-02, grad_scale: 32.0 2023-11-18 04:55:37,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=67666.66666666667, ans=0.125 2023-11-18 04:55:40,640 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:55:40,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=67666.66666666667, ans=0.2 2023-11-18 04:55:45,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.43 vs. limit=12.0 2023-11-18 04:55:57,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=67800.0, ans=0.0 2023-11-18 04:55:57,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67800.0, ans=0.1 2023-11-18 04:55:58,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=12.0 2023-11-18 04:55:59,261 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:56:13,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=67866.66666666667, ans=10.0 2023-11-18 04:56:27,843 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10200, loss[loss=0.1444, simple_loss=0.1493, pruned_loss=0.05806, audio_tagging_loss=0.01165, over 15503.00 frames. ], tot_loss[loss=0.1443, simple_loss=0.1451, pruned_loss=0.05816, audio_tagging_loss=0.01358, over 3051449.72 frames. ], batch size: 56, lr: 3.46e-02, grad_scale: 64.0 2023-11-18 04:56:31,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=68000.0, ans=0.125 2023-11-18 04:56:35,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 1.095e+02 1.241e+02 1.478e+02 2.822e+02, threshold=2.482e+02, percent-clipped=1.0 2023-11-18 04:56:37,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=68000.0, ans=0.5 2023-11-18 04:56:41,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=68066.66666666667, ans=0.0 2023-11-18 04:56:41,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=68066.66666666667, ans=0.1 2023-11-18 04:56:49,765 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:56:58,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=68133.33333333333, ans=0.125 2023-11-18 04:57:17,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.37 vs. limit=15.0 2023-11-18 04:57:18,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=12.0 2023-11-18 04:57:21,713 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:57:23,699 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10250, loss[loss=0.1693, simple_loss=0.1718, pruned_loss=0.06989, audio_tagging_loss=0.01347, over 14980.00 frames. ], tot_loss[loss=0.143, simple_loss=0.1437, pruned_loss=0.05756, audio_tagging_loss=0.01362, over 3045644.59 frames. ], batch size: 55, lr: 3.46e-02, grad_scale: 64.0 2023-11-18 04:57:35,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68400.0, ans=0.1 2023-11-18 04:57:50,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=68466.66666666667, ans=0.125 2023-11-18 04:57:55,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-18 04:57:55,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-11-18 04:57:58,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=12.0 2023-11-18 04:58:03,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=68533.33333333333, ans=0.2 2023-11-18 04:58:10,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=68600.0, ans=0.0 2023-11-18 04:58:19,287 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10300, loss[loss=0.1464, simple_loss=0.1442, pruned_loss=0.06261, audio_tagging_loss=0.01173, over 13935.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1429, pruned_loss=0.05734, audio_tagging_loss=0.01367, over 3048756.58 frames. ], batch size: 54, lr: 3.45e-02, grad_scale: 64.0 2023-11-18 04:58:27,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.697e+01 1.063e+02 1.210e+02 1.437e+02 2.016e+02, threshold=2.421e+02, percent-clipped=0.0 2023-11-18 04:58:27,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-18 04:58:35,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=68733.33333333333, ans=0.125 2023-11-18 04:58:39,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68733.33333333333, ans=0.1 2023-11-18 04:58:46,578 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2023-11-18 04:58:49,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=68800.0, ans=0.125 2023-11-18 04:59:00,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=68866.66666666667, ans=0.0 2023-11-18 04:59:15,529 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10350, loss[loss=0.1788, simple_loss=0.1744, pruned_loss=0.07904, audio_tagging_loss=0.01255, over 16381.00 frames. ], tot_loss[loss=0.1427, simple_loss=0.1434, pruned_loss=0.0573, audio_tagging_loss=0.0137, over 3050775.94 frames. ], batch size: 63, lr: 3.45e-02, grad_scale: 64.0 2023-11-18 04:59:19,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=69000.0, ans=0.1 2023-11-18 04:59:29,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=15.0 2023-11-18 04:59:37,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=69133.33333333333, ans=0.125 2023-11-18 04:59:38,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=69133.33333333333, ans=0.2 2023-11-18 04:59:39,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=69133.33333333333, ans=0.0 2023-11-18 04:59:40,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=12.0 2023-11-18 04:59:48,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=69200.0, ans=0.0 2023-11-18 04:59:52,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=69200.0, ans=0.125 2023-11-18 05:00:11,324 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10400, loss[loss=0.08464, simple_loss=0.06654, pruned_loss=0.02939, audio_tagging_loss=0.02198, over 15329.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1431, pruned_loss=0.05701, audio_tagging_loss=0.01393, over 3043413.72 frames. ], batch size: 59, lr: 3.44e-02, grad_scale: 64.0 2023-11-18 05:00:14,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=15.0 2023-11-18 05:00:18,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.147e+01 1.054e+02 1.220e+02 1.352e+02 2.408e+02, threshold=2.441e+02, percent-clipped=0.0 2023-11-18 05:00:21,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=69400.0, ans=0.125 2023-11-18 05:00:21,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=69400.0, ans=0.125 2023-11-18 05:00:36,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.77 vs. limit=22.5 2023-11-18 05:00:49,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.80 vs. limit=22.5 2023-11-18 05:01:06,944 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10450, loss[loss=0.126, simple_loss=0.1323, pruned_loss=0.0482, audio_tagging_loss=0.01166, over 15886.00 frames. ], tot_loss[loss=0.1441, simple_loss=0.1453, pruned_loss=0.05772, audio_tagging_loss=0.01372, over 3053573.33 frames. ], batch size: 59, lr: 3.44e-02, grad_scale: 64.0 2023-11-18 05:01:17,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=69733.33333333333, ans=0.0 2023-11-18 05:01:28,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=69800.0, ans=0.0 2023-11-18 05:01:38,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=69800.0, ans=0.0 2023-11-18 05:01:52,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=69933.33333333333, ans=0.0 2023-11-18 05:01:58,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-18 05:02:03,062 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10500, loss[loss=0.09982, simple_loss=0.09118, pruned_loss=0.04272, audio_tagging_loss=0.01151, over 16550.00 frames. ], tot_loss[loss=0.1434, simple_loss=0.1443, pruned_loss=0.05781, audio_tagging_loss=0.01342, over 3056169.35 frames. ], batch size: 65, lr: 3.43e-02, grad_scale: 64.0 2023-11-18 05:02:10,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.463e+01 1.096e+02 1.231e+02 1.432e+02 2.125e+02, threshold=2.461e+02, percent-clipped=0.0 2023-11-18 05:02:14,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=70066.66666666667, ans=0.125 2023-11-18 05:02:38,115 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.379e+01 2023-11-18 05:02:39,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=70200.0, ans=0.0 2023-11-18 05:02:53,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2023-11-18 05:02:55,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=70266.66666666667, ans=0.0 2023-11-18 05:02:58,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.94 vs. limit=15.0 2023-11-18 05:02:58,454 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10550, loss[loss=0.1514, simple_loss=0.1646, pruned_loss=0.05712, audio_tagging_loss=0.01199, over 15212.00 frames. ], tot_loss[loss=0.1427, simple_loss=0.1436, pruned_loss=0.05758, audio_tagging_loss=0.01326, over 3055949.03 frames. ], batch size: 57, lr: 3.43e-02, grad_scale: 64.0 2023-11-18 05:03:02,930 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:03:05,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=70333.33333333333, ans=0.2 2023-11-18 05:03:11,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=70400.0, ans=0.125 2023-11-18 05:03:14,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=70400.0, ans=0.1 2023-11-18 05:03:14,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2023-11-18 05:03:14,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-18 05:03:19,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=70466.66666666667, ans=0.125 2023-11-18 05:03:32,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=70533.33333333333, ans=0.125 2023-11-18 05:03:53,335 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10600, loss[loss=0.09953, simple_loss=0.09768, pruned_loss=0.03234, audio_tagging_loss=0.01834, over 14969.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.1423, pruned_loss=0.05672, audio_tagging_loss=0.01328, over 3056560.44 frames. ], batch size: 57, lr: 3.42e-02, grad_scale: 64.0 2023-11-18 05:03:53,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=70666.66666666667, ans=0.0 2023-11-18 05:03:55,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=70666.66666666667, ans=0.125 2023-11-18 05:04:01,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.606e+01 1.084e+02 1.194e+02 1.358e+02 2.173e+02, threshold=2.389e+02, percent-clipped=0.0 2023-11-18 05:04:10,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=15.0 2023-11-18 05:04:11,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.81 vs. limit=15.0 2023-11-18 05:04:32,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.30 vs. limit=22.5 2023-11-18 05:04:43,718 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:04:49,522 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10650, loss[loss=0.1424, simple_loss=0.1542, pruned_loss=0.05628, audio_tagging_loss=0.008988, over 13622.00 frames. ], tot_loss[loss=0.1418, simple_loss=0.1431, pruned_loss=0.05693, audio_tagging_loss=0.01328, over 3054426.47 frames. ], batch size: 53, lr: 3.41e-02, grad_scale: 64.0 2023-11-18 05:05:03,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=71066.66666666667, ans=0.025 2023-11-18 05:05:30,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=71200.0, ans=0.125 2023-11-18 05:05:37,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=71266.66666666667, ans=0.1 2023-11-18 05:05:41,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=71266.66666666667, ans=0.125 2023-11-18 05:05:45,719 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10700, loss[loss=0.2005, simple_loss=0.2132, pruned_loss=0.08421, audio_tagging_loss=0.009639, over 16274.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1443, pruned_loss=0.05727, audio_tagging_loss=0.01306, over 3046949.47 frames. ], batch size: 57, lr: 3.41e-02, grad_scale: 64.0 2023-11-18 05:05:52,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.108e+01 1.048e+02 1.189e+02 1.344e+02 2.146e+02, threshold=2.378e+02, percent-clipped=0.0 2023-11-18 05:06:03,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=71400.0, ans=0.0 2023-11-18 05:06:23,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=71533.33333333333, ans=0.125 2023-11-18 05:06:34,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=71600.0, ans=0.1 2023-11-18 05:06:38,685 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.72 vs. limit=5.0 2023-11-18 05:06:39,924 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10750, loss[loss=0.1491, simple_loss=0.1639, pruned_loss=0.05778, audio_tagging_loss=0.009395, over 15998.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1447, pruned_loss=0.05717, audio_tagging_loss=0.01292, over 3049566.36 frames. ], batch size: 60, lr: 3.40e-02, grad_scale: 64.0 2023-11-18 05:07:26,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=71933.33333333333, ans=0.125 2023-11-18 05:07:28,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=71933.33333333333, ans=0.125 2023-11-18 05:07:35,448 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10800, loss[loss=0.1131, simple_loss=0.1077, pruned_loss=0.04263, audio_tagging_loss=0.01668, over 15758.00 frames. ], tot_loss[loss=0.1415, simple_loss=0.1432, pruned_loss=0.05684, audio_tagging_loss=0.01306, over 3044905.28 frames. ], batch size: 59, lr: 3.40e-02, grad_scale: 64.0 2023-11-18 05:07:43,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.252e+01 1.082e+02 1.179e+02 1.367e+02 2.142e+02, threshold=2.358e+02, percent-clipped=0.0 2023-11-18 05:07:44,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=72000.0, ans=0.125 2023-11-18 05:07:54,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=72066.66666666667, ans=0.0 2023-11-18 05:08:09,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=72200.0, ans=0.125 2023-11-18 05:08:31,989 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10850, loss[loss=0.2165, simple_loss=0.2178, pruned_loss=0.09349, audio_tagging_loss=0.01415, over 16124.00 frames. ], tot_loss[loss=0.1419, simple_loss=0.1434, pruned_loss=0.05705, audio_tagging_loss=0.01316, over 3038968.91 frames. ], batch size: 56, lr: 3.39e-02, grad_scale: 64.0 2023-11-18 05:08:52,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=72466.66666666667, ans=0.125 2023-11-18 05:08:53,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=72466.66666666667, ans=0.125 2023-11-18 05:08:57,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=72466.66666666667, ans=0.125 2023-11-18 05:08:58,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=72466.66666666667, ans=0.2 2023-11-18 05:09:05,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.83 vs. limit=22.5 2023-11-18 05:09:13,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=72533.33333333333, ans=0.1 2023-11-18 05:09:21,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=72600.0, ans=0.2 2023-11-18 05:09:24,807 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:09:26,862 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10900, loss[loss=0.1154, simple_loss=0.1231, pruned_loss=0.03922, audio_tagging_loss=0.01464, over 14852.00 frames. ], tot_loss[loss=0.142, simple_loss=0.1433, pruned_loss=0.05704, audio_tagging_loss=0.01329, over 3042852.20 frames. ], batch size: 57, lr: 3.39e-02, grad_scale: 64.0 2023-11-18 05:09:34,701 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.986e+01 1.097e+02 1.219e+02 1.380e+02 2.178e+02, threshold=2.437e+02, percent-clipped=0.0 2023-11-18 05:09:46,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=72733.33333333333, ans=0.125 2023-11-18 05:10:13,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=72933.33333333333, ans=0.09899494936611666 2023-11-18 05:10:22,383 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 10950, loss[loss=0.148, simple_loss=0.1492, pruned_loss=0.06212, audio_tagging_loss=0.01127, over 14560.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.144, pruned_loss=0.05724, audio_tagging_loss=0.01328, over 3036889.35 frames. ], batch size: 55, lr: 3.38e-02, grad_scale: 64.0 2023-11-18 05:10:31,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=73000.0, ans=0.125 2023-11-18 05:10:31,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2023-11-18 05:10:33,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=73066.66666666667, ans=10.0 2023-11-18 05:10:56,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.02 vs. limit=10.0 2023-11-18 05:11:15,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73266.66666666667, ans=0.1 2023-11-18 05:11:18,769 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11000, loss[loss=0.1535, simple_loss=0.1668, pruned_loss=0.06093, audio_tagging_loss=0.009155, over 15939.00 frames. ], tot_loss[loss=0.1418, simple_loss=0.1437, pruned_loss=0.05672, audio_tagging_loss=0.0132, over 3041291.97 frames. ], batch size: 57, lr: 3.38e-02, grad_scale: 64.0 2023-11-18 05:11:26,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.286e+01 1.063e+02 1.239e+02 1.487e+02 2.361e+02, threshold=2.479e+02, percent-clipped=0.0 2023-11-18 05:11:28,862 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:11:33,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.77 vs. limit=6.0 2023-11-18 05:11:40,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=73466.66666666667, ans=0.0 2023-11-18 05:11:51,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=73533.33333333333, ans=0.125 2023-11-18 05:11:53,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=73533.33333333333, ans=0.1 2023-11-18 05:11:58,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=73533.33333333333, ans=0.0 2023-11-18 05:12:08,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73600.0, ans=0.1 2023-11-18 05:12:08,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=73600.0, ans=0.125 2023-11-18 05:12:13,948 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11050, loss[loss=0.1426, simple_loss=0.1504, pruned_loss=0.05609, audio_tagging_loss=0.01124, over 16389.00 frames. ], tot_loss[loss=0.1439, simple_loss=0.1456, pruned_loss=0.05782, audio_tagging_loss=0.01327, over 3049226.69 frames. ], batch size: 61, lr: 3.37e-02, grad_scale: 64.0 2023-11-18 05:12:18,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=73666.66666666667, ans=0.125 2023-11-18 05:12:23,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=73666.66666666667, ans=0.07 2023-11-18 05:13:09,734 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11100, loss[loss=0.1546, simple_loss=0.1462, pruned_loss=0.06761, audio_tagging_loss=0.01384, over 15778.00 frames. ], tot_loss[loss=0.1451, simple_loss=0.1468, pruned_loss=0.05836, audio_tagging_loss=0.01333, over 3060154.12 frames. ], batch size: 60, lr: 3.37e-02, grad_scale: 64.0 2023-11-18 05:13:17,725 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.064e+01 1.115e+02 1.316e+02 1.523e+02 2.373e+02, threshold=2.632e+02, percent-clipped=0.0 2023-11-18 05:13:20,121 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.093e+01 2023-11-18 05:13:25,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2023-11-18 05:13:29,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=74066.66666666667, ans=0.0 2023-11-18 05:13:45,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=74200.0, ans=0.125 2023-11-18 05:13:48,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.48 vs. limit=15.0 2023-11-18 05:14:01,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2023-11-18 05:14:02,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=74266.66666666667, ans=0.125 2023-11-18 05:14:06,320 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11150, loss[loss=0.1283, simple_loss=0.126, pruned_loss=0.04899, audio_tagging_loss=0.0163, over 15181.00 frames. ], tot_loss[loss=0.1443, simple_loss=0.1455, pruned_loss=0.05802, audio_tagging_loss=0.01349, over 3059131.66 frames. ], batch size: 56, lr: 3.36e-02, grad_scale: 64.0 2023-11-18 05:14:13,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.73 vs. limit=12.0 2023-11-18 05:14:15,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=74400.0, ans=0.0 2023-11-18 05:14:38,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=74533.33333333333, ans=0.125 2023-11-18 05:14:55,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.95 vs. limit=15.0 2023-11-18 05:15:01,599 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11200, loss[loss=0.1132, simple_loss=0.1127, pruned_loss=0.04147, audio_tagging_loss=0.01535, over 15406.00 frames. ], tot_loss[loss=0.1444, simple_loss=0.1458, pruned_loss=0.05796, audio_tagging_loss=0.01348, over 3063136.17 frames. ], batch size: 58, lr: 3.36e-02, grad_scale: 64.0 2023-11-18 05:15:04,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=74666.66666666667, ans=0.0 2023-11-18 05:15:09,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.922e+01 1.084e+02 1.213e+02 1.367e+02 1.851e+02, threshold=2.426e+02, percent-clipped=0.0 2023-11-18 05:15:18,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=22.5 2023-11-18 05:15:18,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=74733.33333333333, ans=0.0 2023-11-18 05:15:57,785 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11250, loss[loss=0.15, simple_loss=0.1513, pruned_loss=0.06071, audio_tagging_loss=0.01362, over 16430.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1439, pruned_loss=0.05711, audio_tagging_loss=0.01343, over 3058163.10 frames. ], batch size: 61, lr: 3.35e-02, grad_scale: 64.0 2023-11-18 05:16:10,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=75066.66666666667, ans=0.125 2023-11-18 05:16:44,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-11-18 05:16:48,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.88 vs. limit=15.0 2023-11-18 05:16:53,040 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11300, loss[loss=0.1641, simple_loss=0.1771, pruned_loss=0.06552, audio_tagging_loss=0.01001, over 15397.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.1424, pruned_loss=0.05654, audio_tagging_loss=0.01321, over 3048693.93 frames. ], batch size: 56, lr: 3.35e-02, grad_scale: 64.0 2023-11-18 05:16:53,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=75333.33333333333, ans=0.125 2023-11-18 05:16:56,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=75333.33333333333, ans=0.1 2023-11-18 05:17:00,934 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.188e+01 1.067e+02 1.239e+02 1.530e+02 2.211e+02, threshold=2.479e+02, percent-clipped=0.0 2023-11-18 05:17:21,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=75466.66666666667, ans=10.0 2023-11-18 05:17:23,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=75466.66666666667, ans=0.0 2023-11-18 05:17:48,715 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11350, loss[loss=0.1318, simple_loss=0.1326, pruned_loss=0.05049, audio_tagging_loss=0.01508, over 15621.00 frames. ], tot_loss[loss=0.1423, simple_loss=0.144, pruned_loss=0.05721, audio_tagging_loss=0.01312, over 3057458.76 frames. ], batch size: 60, lr: 3.34e-02, grad_scale: 64.0 2023-11-18 05:18:34,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=75933.33333333333, ans=0.125 2023-11-18 05:18:44,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.15 vs. limit=15.0 2023-11-18 05:18:45,300 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11400, loss[loss=0.1333, simple_loss=0.1295, pruned_loss=0.05241, audio_tagging_loss=0.0161, over 14358.00 frames. ], tot_loss[loss=0.1423, simple_loss=0.1442, pruned_loss=0.05716, audio_tagging_loss=0.01306, over 3050353.37 frames. ], batch size: 56, lr: 3.34e-02, grad_scale: 64.0 2023-11-18 05:18:52,632 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.156e+01 1.039e+02 1.156e+02 1.287e+02 1.628e+02, threshold=2.311e+02, percent-clipped=0.0 2023-11-18 05:18:56,596 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:19:03,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=76066.66666666667, ans=0.5 2023-11-18 05:19:07,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76133.33333333333, ans=0.1 2023-11-18 05:19:09,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=76133.33333333333, ans=0.2 2023-11-18 05:19:17,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2023-11-18 05:19:40,944 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11450, loss[loss=0.1632, simple_loss=0.1767, pruned_loss=0.06347, audio_tagging_loss=0.01139, over 16069.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.143, pruned_loss=0.05673, audio_tagging_loss=0.01293, over 3050564.87 frames. ], batch size: 57, lr: 3.33e-02, grad_scale: 64.0 2023-11-18 05:19:43,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76333.33333333333, ans=0.1 2023-11-18 05:19:50,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2023-11-18 05:19:56,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=76400.0, ans=0.125 2023-11-18 05:19:56,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=76400.0, ans=0.2 2023-11-18 05:20:13,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=76533.33333333333, ans=0.0 2023-11-18 05:20:17,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=76533.33333333333, ans=0.0 2023-11-18 05:20:31,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76600.0, ans=0.1 2023-11-18 05:20:36,076 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11500, loss[loss=0.122, simple_loss=0.1268, pruned_loss=0.04433, audio_tagging_loss=0.01432, over 14546.00 frames. ], tot_loss[loss=0.1411, simple_loss=0.1431, pruned_loss=0.05659, audio_tagging_loss=0.01294, over 3049350.46 frames. ], batch size: 56, lr: 3.33e-02, grad_scale: 64.0 2023-11-18 05:20:43,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.208e+01 1.030e+02 1.194e+02 1.379e+02 2.068e+02, threshold=2.389e+02, percent-clipped=0.0 2023-11-18 05:20:45,778 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.111e+01 2023-11-18 05:20:53,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=76733.33333333333, ans=0.1 2023-11-18 05:21:09,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=76866.66666666667, ans=0.5 2023-11-18 05:21:12,422 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:21:31,748 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11550, loss[loss=0.1199, simple_loss=0.1331, pruned_loss=0.04022, audio_tagging_loss=0.01314, over 16687.00 frames. ], tot_loss[loss=0.1425, simple_loss=0.1447, pruned_loss=0.0573, audio_tagging_loss=0.01282, over 3055120.31 frames. ], batch size: 63, lr: 3.32e-02, grad_scale: 64.0 2023-11-18 05:22:01,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=77133.33333333333, ans=0.2 2023-11-18 05:22:05,667 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-11-18 05:22:06,033 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:22:24,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=12.0 2023-11-18 05:22:28,059 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11600, loss[loss=0.1566, simple_loss=0.1603, pruned_loss=0.06438, audio_tagging_loss=0.01205, over 14977.00 frames. ], tot_loss[loss=0.1416, simple_loss=0.1442, pruned_loss=0.05659, audio_tagging_loss=0.01289, over 3051430.10 frames. ], batch size: 55, lr: 3.32e-02, grad_scale: 64.0 2023-11-18 05:22:35,972 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.030e+02 1.201e+02 1.372e+02 2.300e+02, threshold=2.402e+02, percent-clipped=0.0 2023-11-18 05:22:39,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=77400.0, ans=0.125 2023-11-18 05:22:50,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=77466.66666666667, ans=0.1 2023-11-18 05:23:05,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.01 vs. limit=10.0 2023-11-18 05:23:22,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=77666.66666666667, ans=0.125 2023-11-18 05:23:23,711 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11650, loss[loss=0.1165, simple_loss=0.1135, pruned_loss=0.0426, audio_tagging_loss=0.01716, over 15427.00 frames. ], tot_loss[loss=0.1415, simple_loss=0.1443, pruned_loss=0.05648, audio_tagging_loss=0.0129, over 3043417.33 frames. ], batch size: 60, lr: 3.31e-02, grad_scale: 64.0 2023-11-18 05:23:24,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=77666.66666666667, ans=0.0 2023-11-18 05:23:25,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=77666.66666666667, ans=0.125 2023-11-18 05:24:00,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.82 vs. limit=22.5 2023-11-18 05:24:18,905 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11700, loss[loss=0.09962, simple_loss=0.1037, pruned_loss=0.03345, audio_tagging_loss=0.01434, over 14893.00 frames. ], tot_loss[loss=0.1395, simple_loss=0.1421, pruned_loss=0.05534, audio_tagging_loss=0.01311, over 3043665.91 frames. ], batch size: 57, lr: 3.31e-02, grad_scale: 64.0 2023-11-18 05:24:19,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=78000.0, ans=0.2 2023-11-18 05:24:24,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=78000.0, ans=0.0 2023-11-18 05:24:26,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.375e+01 1.130e+02 1.304e+02 1.460e+02 2.076e+02, threshold=2.607e+02, percent-clipped=0.0 2023-11-18 05:24:32,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=78066.66666666667, ans=0.0 2023-11-18 05:24:34,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=78066.66666666667, ans=0.0 2023-11-18 05:24:37,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=78066.66666666667, ans=0.125 2023-11-18 05:24:45,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=78133.33333333333, ans=0.125 2023-11-18 05:25:07,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.73 vs. limit=15.0 2023-11-18 05:25:11,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=78266.66666666667, ans=0.0 2023-11-18 05:25:14,874 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11750, loss[loss=0.1302, simple_loss=0.1306, pruned_loss=0.053, audio_tagging_loss=0.01184, over 15014.00 frames. ], tot_loss[loss=0.1403, simple_loss=0.1424, pruned_loss=0.05587, audio_tagging_loss=0.0132, over 3047069.46 frames. ], batch size: 57, lr: 3.30e-02, grad_scale: 64.0 2023-11-18 05:25:37,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=78466.66666666667, ans=0.125 2023-11-18 05:25:38,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=78466.66666666667, ans=0.1 2023-11-18 05:25:53,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=78533.33333333333, ans=0.125 2023-11-18 05:25:53,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-11-18 05:25:54,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=78533.33333333333, ans=0.09899494936611666 2023-11-18 05:26:01,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=78600.0, ans=0.125 2023-11-18 05:26:11,066 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11800, loss[loss=0.1299, simple_loss=0.1299, pruned_loss=0.04663, audio_tagging_loss=0.01828, over 15334.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.1429, pruned_loss=0.05597, audio_tagging_loss=0.01327, over 3046246.69 frames. ], batch size: 58, lr: 3.30e-02, grad_scale: 32.0 2023-11-18 05:26:15,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-11-18 05:26:19,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 1.101e+02 1.270e+02 1.502e+02 2.355e+02, threshold=2.541e+02, percent-clipped=0.0 2023-11-18 05:26:27,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=78733.33333333333, ans=0.125 2023-11-18 05:26:29,173 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2023-11-18 05:26:46,075 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:26:49,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=78866.66666666667, ans=0.0 2023-11-18 05:26:51,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=78866.66666666667, ans=0.125 2023-11-18 05:26:59,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=78933.33333333333, ans=0.125 2023-11-18 05:27:03,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=15.0 2023-11-18 05:27:06,415 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11850, loss[loss=0.1745, simple_loss=0.1808, pruned_loss=0.07345, audio_tagging_loss=0.01062, over 15486.00 frames. ], tot_loss[loss=0.1419, simple_loss=0.144, pruned_loss=0.0566, audio_tagging_loss=0.01332, over 3042099.56 frames. ], batch size: 57, lr: 3.29e-02, grad_scale: 32.0 2023-11-18 05:27:30,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=79133.33333333333, ans=0.0 2023-11-18 05:28:02,172 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11900, loss[loss=0.1089, simple_loss=0.1114, pruned_loss=0.03734, audio_tagging_loss=0.01587, over 14940.00 frames. ], tot_loss[loss=0.1402, simple_loss=0.1424, pruned_loss=0.05548, audio_tagging_loss=0.01349, over 3041237.18 frames. ], batch size: 56, lr: 3.29e-02, grad_scale: 32.0 2023-11-18 05:28:02,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=79333.33333333333, ans=0.07 2023-11-18 05:28:05,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-18 05:28:07,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=79333.33333333333, ans=0.2 2023-11-18 05:28:11,751 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 1.049e+02 1.249e+02 1.472e+02 4.248e+02, threshold=2.498e+02, percent-clipped=1.0 2023-11-18 05:28:19,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.81 vs. limit=22.5 2023-11-18 05:28:19,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=79400.0, ans=0.125 2023-11-18 05:28:33,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=12.0 2023-11-18 05:28:43,738 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:28:58,968 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 11950, loss[loss=0.1766, simple_loss=0.1894, pruned_loss=0.07462, audio_tagging_loss=0.007288, over 15821.00 frames. ], tot_loss[loss=0.1395, simple_loss=0.1415, pruned_loss=0.05511, audio_tagging_loss=0.01359, over 3045594.58 frames. ], batch size: 58, lr: 3.28e-02, grad_scale: 32.0 2023-11-18 05:29:08,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=79733.33333333333, ans=0.2 2023-11-18 05:29:10,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.39 vs. limit=22.5 2023-11-18 05:29:20,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=79800.0, ans=0.0 2023-11-18 05:29:26,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=79800.0, ans=0.125 2023-11-18 05:29:30,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=79866.66666666667, ans=0.125 2023-11-18 05:29:55,519 INFO [train_asr.py:1115] (3/4) Epoch 1, batch 12000, loss[loss=0.1647, simple_loss=0.1691, pruned_loss=0.06776, audio_tagging_loss=0.01236, over 15318.00 frames. ], tot_loss[loss=0.1403, simple_loss=0.1427, pruned_loss=0.05534, audio_tagging_loss=0.01365, over 3048582.87 frames. ], batch size: 56, lr: 3.28e-02, grad_scale: 16.0 2023-11-18 05:29:55,520 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 05:30:31,620 INFO [train_asr.py:1147] (3/4) Epoch 1, validation: loss=0.09272, simple_loss=0.07249, pruned_loss=0.01766, audio_tagging_loss=0.03882, over 4681554.00 frames. 2023-11-18 05:30:31,621 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 05:30:42,376 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 1.066e+02 1.219e+02 1.451e+02 6.762e+02, threshold=2.438e+02, percent-clipped=1.0 2023-11-18 05:30:50,791 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.054e+01 2023-11-18 05:30:51,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=80133.33333333333, ans=0.0 2023-11-18 05:31:38,184 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 0, loss[loss=0.1791, simple_loss=0.177, pruned_loss=0.06481, audio_tagging_loss=0.02575, over 16286.00 frames. ], tot_loss[loss=0.1791, simple_loss=0.177, pruned_loss=0.06481, audio_tagging_loss=0.02575, over 16286.00 frames. ], batch size: 58, lr: 3.21e-02, grad_scale: 32.0 2023-11-18 05:31:38,185 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 05:32:10,419 INFO [train_asr.py:1147] (3/4) Epoch 2, validation: loss=0.09083, simple_loss=0.07252, pruned_loss=0.0178, audio_tagging_loss=0.03677, over 4681554.00 frames. 2023-11-18 05:32:10,420 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 05:32:13,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=80160.0, ans=15.0 2023-11-18 05:32:21,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=80226.66666666667, ans=0.125 2023-11-18 05:32:21,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=80226.66666666667, ans=0.125 2023-11-18 05:32:35,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=80293.33333333333, ans=0.2 2023-11-18 05:32:35,837 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-11-18 05:32:46,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=80360.0, ans=0.125 2023-11-18 05:33:00,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=80426.66666666667, ans=0.125 2023-11-18 05:33:05,837 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 50, loss[loss=0.2206, simple_loss=0.2206, pruned_loss=0.09146, audio_tagging_loss=0.01887, over 16655.00 frames. ], tot_loss[loss=0.1549, simple_loss=0.1468, pruned_loss=0.05711, audio_tagging_loss=0.02439, over 698651.68 frames. ], batch size: 57, lr: 3.21e-02, grad_scale: 32.0 2023-11-18 05:33:21,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=80560.0, ans=15.0 2023-11-18 05:33:25,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=80560.0, ans=0.0 2023-11-18 05:33:46,329 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.563e+01 1.150e+02 1.281e+02 1.485e+02 2.294e+02, threshold=2.563e+02, percent-clipped=0.0 2023-11-18 05:33:51,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80760.0, ans=0.1 2023-11-18 05:33:57,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=80760.0, ans=0.125 2023-11-18 05:34:01,930 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 100, loss[loss=0.0921, simple_loss=0.06914, pruned_loss=0.02752, audio_tagging_loss=0.03001, over 14567.00 frames. ], tot_loss[loss=0.1531, simple_loss=0.1464, pruned_loss=0.05617, audio_tagging_loss=0.02373, over 1218381.84 frames. ], batch size: 56, lr: 3.20e-02, grad_scale: 32.0 2023-11-18 05:34:05,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=80826.66666666667, ans=0.025 2023-11-18 05:34:06,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=80826.66666666667, ans=0.125 2023-11-18 05:34:46,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=81093.33333333333, ans=0.125 2023-11-18 05:34:51,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=81093.33333333333, ans=0.0 2023-11-18 05:34:57,986 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 150, loss[loss=0.1218, simple_loss=0.1144, pruned_loss=0.04673, audio_tagging_loss=0.01789, over 14880.00 frames. ], tot_loss[loss=0.1474, simple_loss=0.1423, pruned_loss=0.05448, audio_tagging_loss=0.02179, over 1621172.28 frames. ], batch size: 56, lr: 3.20e-02, grad_scale: 32.0 2023-11-18 05:35:16,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=81226.66666666667, ans=0.125 2023-11-18 05:35:19,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=81293.33333333333, ans=0.125 2023-11-18 05:35:23,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=81293.33333333333, ans=0.125 2023-11-18 05:35:38,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.932e+01 1.103e+02 1.211e+02 1.385e+02 1.770e+02, threshold=2.422e+02, percent-clipped=0.0 2023-11-18 05:35:54,753 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 200, loss[loss=0.1602, simple_loss=0.1544, pruned_loss=0.06771, audio_tagging_loss=0.01532, over 15891.00 frames. ], tot_loss[loss=0.148, simple_loss=0.1452, pruned_loss=0.05619, audio_tagging_loss=0.01916, over 1941368.63 frames. ], batch size: 58, lr: 3.19e-02, grad_scale: 32.0 2023-11-18 05:35:57,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.48 vs. limit=22.5 2023-11-18 05:36:28,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=81693.33333333333, ans=0.125 2023-11-18 05:36:51,495 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 250, loss[loss=0.09745, simple_loss=0.0987, pruned_loss=0.03322, audio_tagging_loss=0.01488, over 15204.00 frames. ], tot_loss[loss=0.1458, simple_loss=0.1451, pruned_loss=0.05599, audio_tagging_loss=0.01728, over 2191262.03 frames. ], batch size: 57, lr: 3.19e-02, grad_scale: 32.0 2023-11-18 05:36:59,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-11-18 05:37:09,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=81893.33333333333, ans=0.125 2023-11-18 05:37:10,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=81893.33333333333, ans=0.125 2023-11-18 05:37:13,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=81960.0, ans=0.125 2023-11-18 05:37:24,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=82026.66666666667, ans=0.2 2023-11-18 05:37:31,582 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 1.100e+02 1.268e+02 1.445e+02 2.035e+02, threshold=2.536e+02, percent-clipped=0.0 2023-11-18 05:37:34,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=82026.66666666667, ans=0.0 2023-11-18 05:37:36,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=82093.33333333333, ans=0.0 2023-11-18 05:37:47,874 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 300, loss[loss=0.1078, simple_loss=0.1045, pruned_loss=0.0406, audio_tagging_loss=0.01499, over 14455.00 frames. ], tot_loss[loss=0.1444, simple_loss=0.1449, pruned_loss=0.05595, audio_tagging_loss=0.01599, over 2381514.86 frames. ], batch size: 56, lr: 3.18e-02, grad_scale: 32.0 2023-11-18 05:37:56,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=82160.0, ans=0.0 2023-11-18 05:37:59,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.36 vs. limit=15.0 2023-11-18 05:38:19,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=82293.33333333333, ans=0.05 2023-11-18 05:38:36,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=82426.66666666667, ans=0.0 2023-11-18 05:38:43,908 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 350, loss[loss=0.1274, simple_loss=0.1303, pruned_loss=0.05093, audio_tagging_loss=0.01135, over 15252.00 frames. ], tot_loss[loss=0.1417, simple_loss=0.1428, pruned_loss=0.0551, audio_tagging_loss=0.0152, over 2526763.55 frames. ], batch size: 59, lr: 3.18e-02, grad_scale: 32.0 2023-11-18 05:38:55,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=82560.0, ans=0.2 2023-11-18 05:39:02,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=82560.0, ans=0.125 2023-11-18 05:39:17,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=82693.33333333333, ans=0.125 2023-11-18 05:39:24,854 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 1.093e+02 1.219e+02 1.382e+02 1.971e+02, threshold=2.439e+02, percent-clipped=0.0 2023-11-18 05:39:26,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=82693.33333333333, ans=0.0 2023-11-18 05:39:34,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82760.0, ans=0.1 2023-11-18 05:39:40,373 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 400, loss[loss=0.1149, simple_loss=0.1201, pruned_loss=0.03964, audio_tagging_loss=0.01515, over 15022.00 frames. ], tot_loss[loss=0.1406, simple_loss=0.1421, pruned_loss=0.0548, audio_tagging_loss=0.01478, over 2648784.01 frames. ], batch size: 57, lr: 3.17e-02, grad_scale: 32.0 2023-11-18 05:39:49,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=82826.66666666667, ans=0.125 2023-11-18 05:40:10,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=82960.0, ans=0.125 2023-11-18 05:40:18,690 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:40:19,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.62 vs. limit=22.5 2023-11-18 05:40:34,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=83093.33333333333, ans=0.0 2023-11-18 05:40:36,423 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 450, loss[loss=0.1417, simple_loss=0.1446, pruned_loss=0.0543, audio_tagging_loss=0.01508, over 15306.00 frames. ], tot_loss[loss=0.1418, simple_loss=0.1441, pruned_loss=0.05555, audio_tagging_loss=0.01416, over 2733395.05 frames. ], batch size: 59, lr: 3.17e-02, grad_scale: 32.0 2023-11-18 05:40:42,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=83160.0, ans=0.0 2023-11-18 05:40:51,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=83226.66666666667, ans=0.125 2023-11-18 05:41:04,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83293.33333333333, ans=0.1 2023-11-18 05:41:16,692 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.351e+01 1.050e+02 1.181e+02 1.351e+02 2.147e+02, threshold=2.363e+02, percent-clipped=0.0 2023-11-18 05:41:23,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=83426.66666666667, ans=0.125 2023-11-18 05:41:29,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2023-11-18 05:41:32,200 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 500, loss[loss=0.1294, simple_loss=0.1214, pruned_loss=0.05352, audio_tagging_loss=0.01519, over 15495.00 frames. ], tot_loss[loss=0.1415, simple_loss=0.1435, pruned_loss=0.05582, audio_tagging_loss=0.01388, over 2797168.32 frames. ], batch size: 60, lr: 3.16e-02, grad_scale: 32.0 2023-11-18 05:41:35,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=83493.33333333333, ans=22.5 2023-11-18 05:41:42,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=83560.0, ans=0.125 2023-11-18 05:41:51,876 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=15.0 2023-11-18 05:41:55,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83626.66666666667, ans=0.1 2023-11-18 05:41:59,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=83626.66666666667, ans=0.125 2023-11-18 05:42:27,862 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 550, loss[loss=0.1832, simple_loss=0.1859, pruned_loss=0.07711, audio_tagging_loss=0.01318, over 14953.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.143, pruned_loss=0.05555, audio_tagging_loss=0.01379, over 2854336.98 frames. ], batch size: 55, lr: 3.16e-02, grad_scale: 32.0 2023-11-18 05:42:48,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=83893.33333333333, ans=0.0 2023-11-18 05:42:58,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=83960.0, ans=0.2 2023-11-18 05:43:08,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.035e+01 1.150e+02 1.343e+02 1.676e+02 2.273e+02, threshold=2.687e+02, percent-clipped=0.0 2023-11-18 05:43:10,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=84026.66666666667, ans=0.2 2023-11-18 05:43:14,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=84093.33333333333, ans=0.125 2023-11-18 05:43:25,064 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 600, loss[loss=0.1371, simple_loss=0.1358, pruned_loss=0.05568, audio_tagging_loss=0.01351, over 14922.00 frames. ], tot_loss[loss=0.1399, simple_loss=0.1419, pruned_loss=0.05521, audio_tagging_loss=0.0137, over 2901642.54 frames. ], batch size: 57, lr: 3.15e-02, grad_scale: 32.0 2023-11-18 05:43:40,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=84226.66666666667, ans=0.125 2023-11-18 05:43:50,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-18 05:44:02,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=84360.0, ans=0.04949747468305833 2023-11-18 05:44:21,963 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 650, loss[loss=0.1454, simple_loss=0.1518, pruned_loss=0.05876, audio_tagging_loss=0.01073, over 15090.00 frames. ], tot_loss[loss=0.1411, simple_loss=0.1435, pruned_loss=0.05583, audio_tagging_loss=0.01355, over 2944672.40 frames. ], batch size: 58, lr: 3.15e-02, grad_scale: 32.0 2023-11-18 05:44:33,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=84560.0, ans=0.125 2023-11-18 05:44:49,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=84626.66666666667, ans=0.125 2023-11-18 05:45:00,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84693.33333333333, ans=0.1 2023-11-18 05:45:00,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=84693.33333333333, ans=0.0 2023-11-18 05:45:02,892 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.174e+01 1.067e+02 1.188e+02 1.445e+02 2.872e+02, threshold=2.375e+02, percent-clipped=1.0 2023-11-18 05:45:04,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=84693.33333333333, ans=0.2 2023-11-18 05:45:08,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=84760.0, ans=0.125 2023-11-18 05:45:11,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=84760.0, ans=0.125 2023-11-18 05:45:12,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=84760.0, ans=0.0 2023-11-18 05:45:17,840 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 700, loss[loss=0.1446, simple_loss=0.1574, pruned_loss=0.05389, audio_tagging_loss=0.01198, over 15624.00 frames. ], tot_loss[loss=0.1405, simple_loss=0.1427, pruned_loss=0.05561, audio_tagging_loss=0.01353, over 2966796.45 frames. ], batch size: 56, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:45:23,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=84826.66666666667, ans=0.025 2023-11-18 05:45:27,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=84826.66666666667, ans=0.0 2023-11-18 05:45:44,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84960.0, ans=0.1 2023-11-18 05:46:03,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=85093.33333333333, ans=0.05 2023-11-18 05:46:15,209 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 750, loss[loss=0.111, simple_loss=0.1152, pruned_loss=0.03877, audio_tagging_loss=0.01461, over 14636.00 frames. ], tot_loss[loss=0.1392, simple_loss=0.1416, pruned_loss=0.05478, audio_tagging_loss=0.01358, over 2988431.03 frames. ], batch size: 58, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:46:22,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-18 05:46:36,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=85293.33333333333, ans=0.1 2023-11-18 05:46:41,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=85293.33333333333, ans=0.04949747468305833 2023-11-18 05:46:54,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=85360.0, ans=0.2 2023-11-18 05:46:56,124 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.449e+01 1.066e+02 1.181e+02 1.360e+02 2.052e+02, threshold=2.361e+02, percent-clipped=0.0 2023-11-18 05:47:08,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=85426.66666666667, ans=0.125 2023-11-18 05:47:11,522 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 800, loss[loss=0.1097, simple_loss=0.09775, pruned_loss=0.03722, audio_tagging_loss=0.02365, over 16424.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.1415, pruned_loss=0.0546, audio_tagging_loss=0.0135, over 3004995.10 frames. ], batch size: 62, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:47:32,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=85626.66666666667, ans=0.125 2023-11-18 05:47:46,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=15.0 2023-11-18 05:47:58,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2023-11-18 05:48:07,701 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 850, loss[loss=0.1581, simple_loss=0.1663, pruned_loss=0.06185, audio_tagging_loss=0.01313, over 14985.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.1414, pruned_loss=0.05455, audio_tagging_loss=0.01353, over 3014162.10 frames. ], batch size: 57, lr: 3.13e-02, grad_scale: 32.0 2023-11-18 05:48:12,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.04 vs. limit=22.5 2023-11-18 05:48:17,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2023-11-18 05:48:23,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85893.33333333333, ans=0.1 2023-11-18 05:48:39,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=85960.0, ans=0.0 2023-11-18 05:48:48,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.063e+01 1.087e+02 1.227e+02 1.407e+02 2.790e+02, threshold=2.454e+02, percent-clipped=1.0 2023-11-18 05:48:57,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=86093.33333333333, ans=0.125 2023-11-18 05:49:05,241 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 900, loss[loss=0.1226, simple_loss=0.1271, pruned_loss=0.04562, audio_tagging_loss=0.01343, over 15445.00 frames. ], tot_loss[loss=0.1393, simple_loss=0.1418, pruned_loss=0.05481, audio_tagging_loss=0.01352, over 3022629.50 frames. ], batch size: 58, lr: 3.13e-02, grad_scale: 32.0 2023-11-18 05:49:09,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=15.0 2023-11-18 05:49:13,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.01 vs. limit=15.0 2023-11-18 05:49:14,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=86160.0, ans=0.2 2023-11-18 05:49:26,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=86293.33333333333, ans=0.0 2023-11-18 05:49:45,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=86360.0, ans=0.125 2023-11-18 05:50:01,366 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 950, loss[loss=0.188, simple_loss=0.1909, pruned_loss=0.08065, audio_tagging_loss=0.01192, over 16261.00 frames. ], tot_loss[loss=0.1393, simple_loss=0.1422, pruned_loss=0.05486, audio_tagging_loss=0.01329, over 3030908.47 frames. ], batch size: 59, lr: 3.12e-02, grad_scale: 32.0 2023-11-18 05:50:03,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=86493.33333333333, ans=0.1 2023-11-18 05:50:04,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=86493.33333333333, ans=0.2 2023-11-18 05:50:26,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-11-18 05:50:42,201 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.269e+01 1.077e+02 1.200e+02 1.388e+02 2.127e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 05:50:46,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.58 vs. limit=15.0 2023-11-18 05:50:57,282 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1000, loss[loss=0.12, simple_loss=0.1212, pruned_loss=0.04802, audio_tagging_loss=0.01142, over 15546.00 frames. ], tot_loss[loss=0.1389, simple_loss=0.1423, pruned_loss=0.05467, audio_tagging_loss=0.01308, over 3040206.72 frames. ], batch size: 56, lr: 3.12e-02, grad_scale: 32.0 2023-11-18 05:51:12,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=86893.33333333333, ans=0.05 2023-11-18 05:51:21,413 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:51:22,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=86960.0, ans=0.125 2023-11-18 05:51:36,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.05 vs. limit=10.0 2023-11-18 05:51:41,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-18 05:51:53,417 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1050, loss[loss=0.1249, simple_loss=0.1369, pruned_loss=0.0472, audio_tagging_loss=0.009317, over 15393.00 frames. ], tot_loss[loss=0.139, simple_loss=0.1424, pruned_loss=0.05489, audio_tagging_loss=0.01287, over 3041572.54 frames. ], batch size: 58, lr: 3.11e-02, grad_scale: 32.0 2023-11-18 05:52:09,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=87226.66666666667, ans=0.125 2023-11-18 05:52:11,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-11-18 05:52:13,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87226.66666666667, ans=0.1 2023-11-18 05:52:21,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=87293.33333333333, ans=0.02 2023-11-18 05:52:34,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.017e+01 1.050e+02 1.244e+02 1.396e+02 2.108e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 05:52:41,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=87426.66666666667, ans=0.0 2023-11-18 05:52:45,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=87426.66666666667, ans=0.125 2023-11-18 05:52:50,696 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1100, loss[loss=0.1266, simple_loss=0.1309, pruned_loss=0.0464, audio_tagging_loss=0.0148, over 14307.00 frames. ], tot_loss[loss=0.1379, simple_loss=0.1415, pruned_loss=0.05419, audio_tagging_loss=0.01297, over 3040858.63 frames. ], batch size: 54, lr: 3.11e-02, grad_scale: 32.0 2023-11-18 05:52:52,871 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:53:28,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=87693.33333333333, ans=0.125 2023-11-18 05:53:39,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=87760.0, ans=0.025 2023-11-18 05:53:42,863 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:53:46,906 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1150, loss[loss=0.169, simple_loss=0.1862, pruned_loss=0.06523, audio_tagging_loss=0.01066, over 16136.00 frames. ], tot_loss[loss=0.1377, simple_loss=0.1411, pruned_loss=0.05422, audio_tagging_loss=0.01295, over 3036804.90 frames. ], batch size: 57, lr: 3.10e-02, grad_scale: 32.0 2023-11-18 05:53:56,140 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.47 vs. limit=22.5 2023-11-18 05:54:07,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=87893.33333333333, ans=0.125 2023-11-18 05:54:17,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=87960.0, ans=0.125 2023-11-18 05:54:28,331 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 1.025e+02 1.107e+02 1.275e+02 1.816e+02, threshold=2.214e+02, percent-clipped=0.0 2023-11-18 05:54:32,919 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.807e+00 2023-11-18 05:54:43,951 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1200, loss[loss=0.1422, simple_loss=0.1542, pruned_loss=0.05308, audio_tagging_loss=0.01201, over 14903.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1394, pruned_loss=0.05346, audio_tagging_loss=0.01301, over 3029097.34 frames. ], batch size: 55, lr: 3.10e-02, grad_scale: 32.0 2023-11-18 05:54:48,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=88160.0, ans=0.015 2023-11-18 05:55:19,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=88360.0, ans=0.125 2023-11-18 05:55:20,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88360.0, ans=0.1 2023-11-18 05:55:26,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.63 vs. limit=22.5 2023-11-18 05:55:30,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=88426.66666666667, ans=0.2 2023-11-18 05:55:33,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=88426.66666666667, ans=0.125 2023-11-18 05:55:34,913 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:55:39,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=88493.33333333333, ans=0.2 2023-11-18 05:55:40,046 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1250, loss[loss=0.09824, simple_loss=0.09823, pruned_loss=0.03468, audio_tagging_loss=0.01445, over 15397.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1392, pruned_loss=0.05347, audio_tagging_loss=0.01291, over 3033216.91 frames. ], batch size: 61, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:55:45,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=88493.33333333333, ans=0.125 2023-11-18 05:55:53,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2023-11-18 05:55:56,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=88560.0, ans=0.2 2023-11-18 05:56:02,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=88626.66666666667, ans=0.2 2023-11-18 05:56:10,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=88626.66666666667, ans=0.125 2023-11-18 05:56:20,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.599e+01 1.015e+02 1.167e+02 1.344e+02 2.286e+02, threshold=2.335e+02, percent-clipped=1.0 2023-11-18 05:56:36,637 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1300, loss[loss=0.1507, simple_loss=0.1477, pruned_loss=0.06371, audio_tagging_loss=0.0132, over 14631.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1394, pruned_loss=0.05346, audio_tagging_loss=0.013, over 3037896.08 frames. ], batch size: 56, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:56:46,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=88893.33333333333, ans=0.125 2023-11-18 05:57:12,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=89026.66666666667, ans=0.2 2023-11-18 05:57:26,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-18 05:57:30,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-18 05:57:33,128 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1350, loss[loss=0.1489, simple_loss=0.1466, pruned_loss=0.0606, audio_tagging_loss=0.01503, over 16406.00 frames. ], tot_loss[loss=0.1365, simple_loss=0.1401, pruned_loss=0.05366, audio_tagging_loss=0.01278, over 3042959.01 frames. ], batch size: 59, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:57:34,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=89160.0, ans=0.125 2023-11-18 05:57:43,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=89226.66666666667, ans=0.125 2023-11-18 05:57:44,528 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:57:50,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=89226.66666666667, ans=0.125 2023-11-18 05:57:53,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89226.66666666667, ans=0.1 2023-11-18 05:58:10,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=89360.0, ans=0.125 2023-11-18 05:58:14,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 1.078e+02 1.206e+02 1.341e+02 1.953e+02, threshold=2.412e+02, percent-clipped=0.0 2023-11-18 05:58:14,322 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:58:29,864 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1400, loss[loss=0.115, simple_loss=0.1072, pruned_loss=0.04437, audio_tagging_loss=0.01702, over 15397.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.1408, pruned_loss=0.05404, audio_tagging_loss=0.01303, over 3052414.06 frames. ], batch size: 61, lr: 3.08e-02, grad_scale: 32.0 2023-11-18 05:58:37,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=89493.33333333333, ans=0.0 2023-11-18 05:58:39,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.03 vs. limit=10.0 2023-11-18 05:58:46,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=89560.0, ans=0.0 2023-11-18 05:58:55,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=89626.66666666667, ans=0.125 2023-11-18 05:58:58,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=89626.66666666667, ans=0.2 2023-11-18 05:59:03,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=89693.33333333333, ans=0.2 2023-11-18 05:59:19,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=89760.0, ans=0.125 2023-11-18 05:59:27,054 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1450, loss[loss=0.1302, simple_loss=0.1324, pruned_loss=0.0487, audio_tagging_loss=0.01526, over 14742.00 frames. ], tot_loss[loss=0.1356, simple_loss=0.1386, pruned_loss=0.05304, audio_tagging_loss=0.01322, over 3047149.08 frames. ], batch size: 55, lr: 3.08e-02, grad_scale: 32.0 2023-11-18 05:59:54,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=89960.0, ans=0.125 2023-11-18 05:59:56,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=89960.0, ans=0.125 2023-11-18 05:59:56,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2023-11-18 06:00:02,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.00 vs. limit=6.0 2023-11-18 06:00:07,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 1.064e+02 1.199e+02 1.327e+02 1.919e+02, threshold=2.398e+02, percent-clipped=0.0 2023-11-18 06:00:07,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=90026.66666666667, ans=22.5 2023-11-18 06:00:17,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=90093.33333333333, ans=0.125 2023-11-18 06:00:18,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=90093.33333333333, ans=0.0 2023-11-18 06:00:18,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=90093.33333333333, ans=0.0 2023-11-18 06:00:19,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=90093.33333333333, ans=0.125 2023-11-18 06:00:23,252 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1500, loss[loss=0.1625, simple_loss=0.1613, pruned_loss=0.06719, audio_tagging_loss=0.0146, over 15110.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.14, pruned_loss=0.05343, audio_tagging_loss=0.0132, over 3051652.05 frames. ], batch size: 57, lr: 3.07e-02, grad_scale: 32.0 2023-11-18 06:00:33,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=90226.66666666667, ans=0.125 2023-11-18 06:00:35,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=90226.66666666667, ans=0.125 2023-11-18 06:00:38,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=90226.66666666667, ans=0.2 2023-11-18 06:00:50,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=90293.33333333333, ans=0.0 2023-11-18 06:01:09,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=90426.66666666667, ans=0.2 2023-11-18 06:01:11,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-11-18 06:01:15,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=90426.66666666667, ans=0.0 2023-11-18 06:01:19,520 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1550, loss[loss=0.133, simple_loss=0.1354, pruned_loss=0.05255, audio_tagging_loss=0.01277, over 15403.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1394, pruned_loss=0.05309, audio_tagging_loss=0.01315, over 3050106.91 frames. ], batch size: 57, lr: 3.07e-02, grad_scale: 32.0 2023-11-18 06:01:31,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=90560.0, ans=0.125 2023-11-18 06:01:32,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=90560.0, ans=0.1 2023-11-18 06:01:35,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=90560.0, ans=0.5 2023-11-18 06:01:38,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=90560.0, ans=0.2 2023-11-18 06:01:51,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=90626.66666666667, ans=0.125 2023-11-18 06:02:00,210 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 1.048e+02 1.182e+02 1.332e+02 1.868e+02, threshold=2.363e+02, percent-clipped=0.0 2023-11-18 06:02:15,848 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1600, loss[loss=0.1478, simple_loss=0.1511, pruned_loss=0.06072, audio_tagging_loss=0.01159, over 13676.00 frames. ], tot_loss[loss=0.1358, simple_loss=0.139, pruned_loss=0.05308, audio_tagging_loss=0.01323, over 3058198.75 frames. ], batch size: 54, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:02:19,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=90826.66666666667, ans=0.0 2023-11-18 06:02:24,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=90826.66666666667, ans=0.125 2023-11-18 06:02:47,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=90960.0, ans=0.125 2023-11-18 06:02:53,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.87 vs. limit=10.0 2023-11-18 06:03:00,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=91093.33333333333, ans=0.04949747468305833 2023-11-18 06:03:12,235 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1650, loss[loss=0.1349, simple_loss=0.1438, pruned_loss=0.05061, audio_tagging_loss=0.0124, over 15375.00 frames. ], tot_loss[loss=0.1358, simple_loss=0.1386, pruned_loss=0.05314, audio_tagging_loss=0.01338, over 3063352.74 frames. ], batch size: 57, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:03:14,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=91160.0, ans=0.0 2023-11-18 06:03:15,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=91160.0, ans=0.125 2023-11-18 06:03:45,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=91360.0, ans=0.125 2023-11-18 06:03:48,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=15.0 2023-11-18 06:03:53,161 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.238e+01 1.052e+02 1.201e+02 1.408e+02 1.916e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 06:04:04,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=91426.66666666667, ans=12.0 2023-11-18 06:04:09,197 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1700, loss[loss=0.1181, simple_loss=0.1274, pruned_loss=0.0429, audio_tagging_loss=0.01151, over 14187.00 frames. ], tot_loss[loss=0.1363, simple_loss=0.1394, pruned_loss=0.05331, audio_tagging_loss=0.01327, over 3053154.29 frames. ], batch size: 54, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:04:15,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2023-11-18 06:04:50,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=91693.33333333333, ans=0.125 2023-11-18 06:04:59,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=91760.0, ans=0.2 2023-11-18 06:05:06,087 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1750, loss[loss=0.1609, simple_loss=0.1715, pruned_loss=0.06418, audio_tagging_loss=0.01095, over 15908.00 frames. ], tot_loss[loss=0.1363, simple_loss=0.1397, pruned_loss=0.05327, audio_tagging_loss=0.01313, over 3050251.02 frames. ], batch size: 56, lr: 3.05e-02, grad_scale: 32.0 2023-11-18 06:05:09,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2023-11-18 06:05:17,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=91893.33333333333, ans=0.125 2023-11-18 06:05:43,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2023-11-18 06:05:47,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.450e+01 1.111e+02 1.237e+02 1.383e+02 2.082e+02, threshold=2.473e+02, percent-clipped=0.0 2023-11-18 06:05:51,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=92093.33333333333, ans=0.125 2023-11-18 06:05:51,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=92093.33333333333, ans=0.125 2023-11-18 06:05:53,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=92093.33333333333, ans=0.0 2023-11-18 06:06:02,397 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1800, loss[loss=0.1711, simple_loss=0.1883, pruned_loss=0.06754, audio_tagging_loss=0.009454, over 15912.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.1408, pruned_loss=0.05357, audio_tagging_loss=0.01294, over 3048996.56 frames. ], batch size: 59, lr: 3.05e-02, grad_scale: 32.0 2023-11-18 06:06:09,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=92160.0, ans=0.125 2023-11-18 06:06:17,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=92226.66666666667, ans=0.2 2023-11-18 06:06:29,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=92293.33333333333, ans=0.125 2023-11-18 06:06:29,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=92293.33333333333, ans=10.0 2023-11-18 06:06:35,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=92360.0, ans=0.125 2023-11-18 06:06:40,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92360.0, ans=0.1 2023-11-18 06:06:57,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=92426.66666666667, ans=0.0 2023-11-18 06:06:59,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=92493.33333333333, ans=0.125 2023-11-18 06:06:59,998 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1850, loss[loss=0.1075, simple_loss=0.1058, pruned_loss=0.03911, audio_tagging_loss=0.01546, over 15332.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.1407, pruned_loss=0.05343, audio_tagging_loss=0.01276, over 3045594.82 frames. ], batch size: 58, lr: 3.04e-02, grad_scale: 32.0 2023-11-18 06:07:12,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=92560.0, ans=0.5 2023-11-18 06:07:23,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=92626.66666666667, ans=0.2 2023-11-18 06:07:30,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=92626.66666666667, ans=0.0 2023-11-18 06:07:34,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=92693.33333333333, ans=0.125 2023-11-18 06:07:40,255 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 1.045e+02 1.179e+02 1.336e+02 1.806e+02, threshold=2.358e+02, percent-clipped=0.0 2023-11-18 06:07:44,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=12.0 2023-11-18 06:07:49,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.73 vs. limit=10.0 2023-11-18 06:07:51,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92760.0, ans=0.1 2023-11-18 06:07:55,795 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1900, loss[loss=0.106, simple_loss=0.09995, pruned_loss=0.0397, audio_tagging_loss=0.01632, over 16907.00 frames. ], tot_loss[loss=0.1353, simple_loss=0.1394, pruned_loss=0.05278, audio_tagging_loss=0.01281, over 3045539.08 frames. ], batch size: 66, lr: 3.04e-02, grad_scale: 32.0 2023-11-18 06:08:24,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2023-11-18 06:08:51,656 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 1950, loss[loss=0.122, simple_loss=0.119, pruned_loss=0.04652, audio_tagging_loss=0.01591, over 15958.00 frames. ], tot_loss[loss=0.1332, simple_loss=0.1369, pruned_loss=0.05186, audio_tagging_loss=0.0129, over 3041897.10 frames. ], batch size: 63, lr: 3.03e-02, grad_scale: 32.0 2023-11-18 06:09:32,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=93360.0, ans=0.02 2023-11-18 06:09:32,959 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 1.040e+02 1.152e+02 1.328e+02 1.978e+02, threshold=2.303e+02, percent-clipped=0.0 2023-11-18 06:09:34,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=93360.0, ans=0.0 2023-11-18 06:09:34,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.11 vs. limit=6.0 2023-11-18 06:09:49,681 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2000, loss[loss=0.1602, simple_loss=0.1678, pruned_loss=0.06169, audio_tagging_loss=0.01465, over 14765.00 frames. ], tot_loss[loss=0.1332, simple_loss=0.1367, pruned_loss=0.05196, audio_tagging_loss=0.01288, over 3041546.78 frames. ], batch size: 54, lr: 3.03e-02, grad_scale: 64.0 2023-11-18 06:09:53,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=93493.33333333333, ans=0.2 2023-11-18 06:09:54,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-18 06:10:00,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93560.0, ans=0.1 2023-11-18 06:10:04,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=93560.0, ans=0.125 2023-11-18 06:10:05,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=93560.0, ans=0.125 2023-11-18 06:10:08,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93560.0, ans=0.1 2023-11-18 06:10:10,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2023-11-18 06:10:26,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93693.33333333333, ans=0.1 2023-11-18 06:10:45,928 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2050, loss[loss=0.1561, simple_loss=0.1595, pruned_loss=0.06214, audio_tagging_loss=0.01424, over 14667.00 frames. ], tot_loss[loss=0.1334, simple_loss=0.1372, pruned_loss=0.05187, audio_tagging_loss=0.01289, over 3038184.14 frames. ], batch size: 54, lr: 3.03e-02, grad_scale: 64.0 2023-11-18 06:10:57,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2023-11-18 06:11:03,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=93893.33333333333, ans=0.2 2023-11-18 06:11:10,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-11-18 06:11:26,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 1.053e+02 1.201e+02 1.345e+02 1.920e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 06:11:34,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=94093.33333333333, ans=0.125 2023-11-18 06:11:41,152 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2100, loss[loss=0.1595, simple_loss=0.1685, pruned_loss=0.06594, audio_tagging_loss=0.009365, over 14487.00 frames. ], tot_loss[loss=0.1349, simple_loss=0.1392, pruned_loss=0.05251, audio_tagging_loss=0.01277, over 3037809.52 frames. ], batch size: 54, lr: 3.02e-02, grad_scale: 64.0 2023-11-18 06:11:46,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=94160.0, ans=0.125 2023-11-18 06:11:56,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=94226.66666666667, ans=0.125 2023-11-18 06:12:27,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=94426.66666666667, ans=0.0 2023-11-18 06:12:35,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=94426.66666666667, ans=0.0 2023-11-18 06:12:37,006 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2150, loss[loss=0.1322, simple_loss=0.1303, pruned_loss=0.05318, audio_tagging_loss=0.0139, over 15204.00 frames. ], tot_loss[loss=0.1353, simple_loss=0.1395, pruned_loss=0.05268, audio_tagging_loss=0.01283, over 3037021.84 frames. ], batch size: 57, lr: 3.02e-02, grad_scale: 64.0 2023-11-18 06:12:47,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=94493.33333333333, ans=0.0 2023-11-18 06:12:54,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=94560.0, ans=0.0 2023-11-18 06:13:01,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=94626.66666666667, ans=0.0 2023-11-18 06:13:03,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94626.66666666667, ans=0.1 2023-11-18 06:13:10,150 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:13:18,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.172e+01 1.043e+02 1.205e+02 1.372e+02 2.009e+02, threshold=2.410e+02, percent-clipped=0.0 2023-11-18 06:13:22,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=94760.0, ans=0.1 2023-11-18 06:13:31,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=94760.0, ans=0.125 2023-11-18 06:13:31,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=94760.0, ans=0.125 2023-11-18 06:13:34,782 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2200, loss[loss=0.1528, simple_loss=0.1655, pruned_loss=0.05722, audio_tagging_loss=0.01281, over 17338.00 frames. ], tot_loss[loss=0.1365, simple_loss=0.1409, pruned_loss=0.0533, audio_tagging_loss=0.01277, over 3036798.43 frames. ], batch size: 61, lr: 3.01e-02, grad_scale: 64.0 2023-11-18 06:13:41,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=94826.66666666667, ans=0.125 2023-11-18 06:13:46,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=94893.33333333333, ans=0.125 2023-11-18 06:13:51,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2023-11-18 06:13:51,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=94893.33333333333, ans=0.0 2023-11-18 06:13:52,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=94893.33333333333, ans=0.2 2023-11-18 06:13:57,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=94960.0, ans=0.07 2023-11-18 06:14:02,120 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:14:30,435 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2250, loss[loss=0.2021, simple_loss=0.2018, pruned_loss=0.08985, audio_tagging_loss=0.01134, over 15085.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.1417, pruned_loss=0.05386, audio_tagging_loss=0.0128, over 3046324.54 frames. ], batch size: 57, lr: 3.01e-02, grad_scale: 32.0 2023-11-18 06:14:31,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2023-11-18 06:14:58,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=95293.33333333333, ans=0.125 2023-11-18 06:14:58,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=95293.33333333333, ans=0.125 2023-11-18 06:15:00,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95293.33333333333, ans=0.1 2023-11-18 06:15:08,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2023-11-18 06:15:12,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 1.062e+02 1.230e+02 1.401e+02 2.481e+02, threshold=2.461e+02, percent-clipped=1.0 2023-11-18 06:15:21,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=95426.66666666667, ans=0.0 2023-11-18 06:15:26,929 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2300, loss[loss=0.1299, simple_loss=0.1217, pruned_loss=0.05329, audio_tagging_loss=0.01577, over 14575.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.142, pruned_loss=0.05369, audio_tagging_loss=0.01284, over 3041326.03 frames. ], batch size: 56, lr: 3.01e-02, grad_scale: 32.0 2023-11-18 06:16:11,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=95760.0, ans=0.025 2023-11-18 06:16:12,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=95760.0, ans=0.125 2023-11-18 06:16:13,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95760.0, ans=0.1 2023-11-18 06:16:15,562 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:16:21,063 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2023-11-18 06:16:24,214 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2350, loss[loss=0.1142, simple_loss=0.1085, pruned_loss=0.0422, audio_tagging_loss=0.01772, over 16181.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.1421, pruned_loss=0.05366, audio_tagging_loss=0.01287, over 3047686.63 frames. ], batch size: 61, lr: 3.00e-02, grad_scale: 32.0 2023-11-18 06:16:25,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=95826.66666666667, ans=0.035 2023-11-18 06:16:30,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=95826.66666666667, ans=15.0 2023-11-18 06:16:40,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=95893.33333333333, ans=0.2 2023-11-18 06:16:44,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95960.0, ans=0.1 2023-11-18 06:16:49,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=95960.0, ans=0.0 2023-11-18 06:16:50,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.21 vs. limit=22.5 2023-11-18 06:17:06,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 1.033e+02 1.167e+02 1.342e+02 2.194e+02, threshold=2.335e+02, percent-clipped=0.0 2023-11-18 06:17:13,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=96093.33333333333, ans=0.0 2023-11-18 06:17:20,437 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2400, loss[loss=0.1598, simple_loss=0.1787, pruned_loss=0.05762, audio_tagging_loss=0.01279, over 14818.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.1413, pruned_loss=0.05327, audio_tagging_loss=0.01301, over 3049910.94 frames. ], batch size: 57, lr: 3.00e-02, grad_scale: 32.0 2023-11-18 06:17:25,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=96160.0, ans=0.1 2023-11-18 06:17:44,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=12.0 2023-11-18 06:18:16,570 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2450, loss[loss=0.1041, simple_loss=0.1034, pruned_loss=0.03768, audio_tagging_loss=0.01471, over 15314.00 frames. ], tot_loss[loss=0.1363, simple_loss=0.1404, pruned_loss=0.05283, audio_tagging_loss=0.01326, over 3043213.72 frames. ], batch size: 60, lr: 2.99e-02, grad_scale: 32.0 2023-11-18 06:18:19,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-11-18 06:18:20,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=12.0 2023-11-18 06:18:30,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=96560.0, ans=0.0 2023-11-18 06:18:36,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=96560.0, ans=0.2 2023-11-18 06:18:45,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.96 vs. limit=15.0 2023-11-18 06:18:58,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.568e+01 1.053e+02 1.171e+02 1.330e+02 1.894e+02, threshold=2.342e+02, percent-clipped=0.0 2023-11-18 06:19:13,739 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2500, loss[loss=0.133, simple_loss=0.1309, pruned_loss=0.05314, audio_tagging_loss=0.01437, over 15334.00 frames. ], tot_loss[loss=0.1352, simple_loss=0.1391, pruned_loss=0.05241, audio_tagging_loss=0.01324, over 3050415.69 frames. ], batch size: 56, lr: 2.99e-02, grad_scale: 32.0 2023-11-18 06:19:24,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2023-11-18 06:19:30,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=96893.33333333333, ans=0.1 2023-11-18 06:19:43,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=96960.0, ans=0.0 2023-11-18 06:19:47,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97026.66666666667, ans=0.1 2023-11-18 06:19:49,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=97026.66666666667, ans=0.0 2023-11-18 06:20:01,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=97093.33333333333, ans=0.125 2023-11-18 06:20:10,029 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2550, loss[loss=0.1165, simple_loss=0.12, pruned_loss=0.04666, audio_tagging_loss=0.009852, over 14203.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1385, pruned_loss=0.05223, audio_tagging_loss=0.01326, over 3040290.16 frames. ], batch size: 55, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:20:12,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2023-11-18 06:20:26,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=97226.66666666667, ans=0.125 2023-11-18 06:20:30,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=97226.66666666667, ans=0.125 2023-11-18 06:20:34,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-18 06:20:36,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.20 vs. limit=15.0 2023-11-18 06:20:46,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97360.0, ans=0.1 2023-11-18 06:20:51,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.040e+01 1.036e+02 1.193e+02 1.343e+02 1.842e+02, threshold=2.386e+02, percent-clipped=0.0 2023-11-18 06:21:01,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=97426.66666666667, ans=0.95 2023-11-18 06:21:06,241 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2600, loss[loss=0.09729, simple_loss=0.1086, pruned_loss=0.03281, audio_tagging_loss=0.01017, over 14805.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.1376, pruned_loss=0.05189, audio_tagging_loss=0.01307, over 3044088.33 frames. ], batch size: 54, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:21:15,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=97493.33333333333, ans=0.05 2023-11-18 06:21:30,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=97626.66666666667, ans=0.125 2023-11-18 06:22:02,791 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2650, loss[loss=0.1419, simple_loss=0.1532, pruned_loss=0.05493, audio_tagging_loss=0.01038, over 16066.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.1379, pruned_loss=0.05189, audio_tagging_loss=0.01292, over 3046003.02 frames. ], batch size: 60, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:22:42,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=98026.66666666667, ans=0.125 2023-11-18 06:22:44,861 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.069e+02 1.231e+02 1.397e+02 2.138e+02, threshold=2.463e+02, percent-clipped=0.0 2023-11-18 06:22:45,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=98026.66666666667, ans=0.0 2023-11-18 06:22:59,921 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2700, loss[loss=0.1611, simple_loss=0.1665, pruned_loss=0.06468, audio_tagging_loss=0.01315, over 14560.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.1385, pruned_loss=0.05207, audio_tagging_loss=0.01285, over 3044991.70 frames. ], batch size: 53, lr: 2.97e-02, grad_scale: 32.0 2023-11-18 06:23:07,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=98160.0, ans=0.1 2023-11-18 06:23:20,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=98226.66666666667, ans=0.125 2023-11-18 06:23:35,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=98360.0, ans=0.2 2023-11-18 06:23:49,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=98426.66666666667, ans=0.1 2023-11-18 06:23:56,186 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2750, loss[loss=0.1349, simple_loss=0.1367, pruned_loss=0.0563, audio_tagging_loss=0.01022, over 14509.00 frames. ], tot_loss[loss=0.1339, simple_loss=0.1383, pruned_loss=0.05191, audio_tagging_loss=0.01285, over 3048582.33 frames. ], batch size: 55, lr: 2.97e-02, grad_scale: 32.0 2023-11-18 06:24:06,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=98560.0, ans=0.125 2023-11-18 06:24:07,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=98560.0, ans=0.0 2023-11-18 06:24:19,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=98626.66666666667, ans=0.125 2023-11-18 06:24:29,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=98693.33333333333, ans=0.125 2023-11-18 06:24:30,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=98693.33333333333, ans=0.2 2023-11-18 06:24:37,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 1.008e+02 1.194e+02 1.354e+02 1.877e+02, threshold=2.388e+02, percent-clipped=0.0 2023-11-18 06:24:37,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=98693.33333333333, ans=0.125 2023-11-18 06:24:42,372 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:24:52,457 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2800, loss[loss=0.1351, simple_loss=0.137, pruned_loss=0.05342, audio_tagging_loss=0.01324, over 15013.00 frames. ], tot_loss[loss=0.1339, simple_loss=0.1386, pruned_loss=0.05191, audio_tagging_loss=0.0127, over 3045439.81 frames. ], batch size: 56, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:25:03,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=98893.33333333333, ans=0.95 2023-11-18 06:25:07,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=98893.33333333333, ans=0.125 2023-11-18 06:25:10,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=98893.33333333333, ans=0.0 2023-11-18 06:25:39,950 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2023-11-18 06:25:40,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=99093.33333333333, ans=0.0 2023-11-18 06:25:48,864 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2850, loss[loss=0.1577, simple_loss=0.1615, pruned_loss=0.06619, audio_tagging_loss=0.01079, over 15960.00 frames. ], tot_loss[loss=0.1334, simple_loss=0.1383, pruned_loss=0.05152, audio_tagging_loss=0.01276, over 3046374.15 frames. ], batch size: 58, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:26:00,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.43 vs. limit=22.5 2023-11-18 06:26:13,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=99293.33333333333, ans=0.0 2023-11-18 06:26:17,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2023-11-18 06:26:26,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=99360.0, ans=0.125 2023-11-18 06:26:28,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=99360.0, ans=0.5 2023-11-18 06:26:30,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 1.055e+02 1.276e+02 1.437e+02 2.072e+02, threshold=2.552e+02, percent-clipped=0.0 2023-11-18 06:26:32,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2023-11-18 06:26:38,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-18 06:26:45,308 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2900, loss[loss=0.1431, simple_loss=0.1504, pruned_loss=0.05538, audio_tagging_loss=0.01247, over 14788.00 frames. ], tot_loss[loss=0.1334, simple_loss=0.1378, pruned_loss=0.0517, audio_tagging_loss=0.01279, over 3044419.00 frames. ], batch size: 57, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:26:56,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=99560.0, ans=0.125 2023-11-18 06:27:08,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=99626.66666666667, ans=0.2 2023-11-18 06:27:08,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=99626.66666666667, ans=0.0 2023-11-18 06:27:09,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=99626.66666666667, ans=0.0 2023-11-18 06:27:10,041 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.14 vs. limit=15.0 2023-11-18 06:27:42,250 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 2950, loss[loss=0.112, simple_loss=0.11, pruned_loss=0.04252, audio_tagging_loss=0.01443, over 15366.00 frames. ], tot_loss[loss=0.1348, simple_loss=0.1395, pruned_loss=0.05233, audio_tagging_loss=0.01274, over 3048263.54 frames. ], batch size: 58, lr: 2.95e-02, grad_scale: 16.0 2023-11-18 06:27:45,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99826.66666666667, ans=0.1 2023-11-18 06:28:00,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.65 vs. limit=15.0 2023-11-18 06:28:06,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=99960.0, ans=0.125 2023-11-18 06:28:10,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=99960.0, ans=0.0 2023-11-18 06:28:16,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=100026.66666666667, ans=0.125 2023-11-18 06:28:19,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=100026.66666666667, ans=0.125 2023-11-18 06:28:25,091 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.080e+01 1.058e+02 1.250e+02 1.448e+02 1.793e+02, threshold=2.500e+02, percent-clipped=0.0 2023-11-18 06:28:31,428 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:28:36,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=100093.33333333333, ans=0.04949747468305833 2023-11-18 06:28:38,683 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3000, loss[loss=0.1438, simple_loss=0.1475, pruned_loss=0.05538, audio_tagging_loss=0.01473, over 14750.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1391, pruned_loss=0.05228, audio_tagging_loss=0.01283, over 3048143.70 frames. ], batch size: 55, lr: 2.95e-02, grad_scale: 16.0 2023-11-18 06:28:38,684 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 06:29:12,289 INFO [train_asr.py:1147] (3/4) Epoch 2, validation: loss=0.0901, simple_loss=0.07118, pruned_loss=0.01674, audio_tagging_loss=0.03777, over 4681554.00 frames. 2023-11-18 06:29:12,290 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 06:29:20,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=100160.0, ans=0.0 2023-11-18 06:29:46,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=100360.0, ans=0.0 2023-11-18 06:29:48,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=100360.0, ans=0.125 2023-11-18 06:29:50,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=15.0 2023-11-18 06:30:03,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=100426.66666666667, ans=0.125 2023-11-18 06:30:07,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=100493.33333333333, ans=0.0 2023-11-18 06:30:08,707 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3050, loss[loss=0.1224, simple_loss=0.1203, pruned_loss=0.04864, audio_tagging_loss=0.01365, over 15474.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1407, pruned_loss=0.05297, audio_tagging_loss=0.01272, over 3051882.98 frames. ], batch size: 58, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:30:26,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=100560.0, ans=0.07 2023-11-18 06:30:33,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.17 vs. limit=22.5 2023-11-18 06:30:39,946 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:30:51,083 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.130e+01 1.055e+02 1.164e+02 1.306e+02 1.882e+02, threshold=2.329e+02, percent-clipped=0.0 2023-11-18 06:30:52,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=100760.0, ans=0.95 2023-11-18 06:30:53,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=100760.0, ans=0.125 2023-11-18 06:30:56,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=100760.0, ans=0.0 2023-11-18 06:31:04,634 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3100, loss[loss=0.1278, simple_loss=0.1344, pruned_loss=0.04633, audio_tagging_loss=0.01434, over 15927.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.1427, pruned_loss=0.05341, audio_tagging_loss=0.01276, over 3053352.65 frames. ], batch size: 59, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:31:19,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=100893.33333333333, ans=0.2 2023-11-18 06:31:26,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=100960.0, ans=0.5 2023-11-18 06:31:29,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=15.0 2023-11-18 06:31:36,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=100960.0, ans=0.125 2023-11-18 06:31:41,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=101026.66666666667, ans=0.125 2023-11-18 06:31:41,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=101026.66666666667, ans=0.125 2023-11-18 06:31:43,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=101026.66666666667, ans=0.035 2023-11-18 06:31:55,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=101093.33333333333, ans=0.1 2023-11-18 06:32:00,115 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3150, loss[loss=0.1111, simple_loss=0.1059, pruned_loss=0.0411, audio_tagging_loss=0.01701, over 14013.00 frames. ], tot_loss[loss=0.1377, simple_loss=0.1427, pruned_loss=0.05352, audio_tagging_loss=0.01284, over 3054377.49 frames. ], batch size: 55, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:32:16,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=101226.66666666667, ans=0.0 2023-11-18 06:32:18,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=101226.66666666667, ans=0.125 2023-11-18 06:32:18,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=101226.66666666667, ans=22.5 2023-11-18 06:32:39,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=101360.0, ans=0.125 2023-11-18 06:32:43,611 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.183e+01 1.035e+02 1.177e+02 1.341e+02 1.863e+02, threshold=2.355e+02, percent-clipped=0.0 2023-11-18 06:32:43,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101360.0, ans=0.1 2023-11-18 06:32:57,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=101493.33333333333, ans=0.2 2023-11-18 06:32:58,085 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3200, loss[loss=0.09289, simple_loss=0.08653, pruned_loss=0.03281, audio_tagging_loss=0.01682, over 15537.00 frames. ], tot_loss[loss=0.1371, simple_loss=0.1419, pruned_loss=0.05315, audio_tagging_loss=0.01302, over 3050188.02 frames. ], batch size: 58, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:33:03,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=101493.33333333333, ans=0.125 2023-11-18 06:33:20,754 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2023-11-18 06:33:23,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=101626.66666666667, ans=0.125 2023-11-18 06:33:24,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=15.0 2023-11-18 06:33:30,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=101693.33333333333, ans=0.0 2023-11-18 06:33:30,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=101693.33333333333, ans=0.2 2023-11-18 06:33:42,357 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2023-11-18 06:33:50,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=101760.0, ans=0.125 2023-11-18 06:33:54,451 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3250, loss[loss=0.1302, simple_loss=0.1394, pruned_loss=0.04747, audio_tagging_loss=0.01306, over 14811.00 frames. ], tot_loss[loss=0.1367, simple_loss=0.1414, pruned_loss=0.05291, audio_tagging_loss=0.01312, over 3052596.86 frames. ], batch size: 56, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:33:56,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=101826.66666666667, ans=0.2 2023-11-18 06:34:02,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=101826.66666666667, ans=0.125 2023-11-18 06:34:10,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=101893.33333333333, ans=0.125 2023-11-18 06:34:17,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=101960.0, ans=0.0 2023-11-18 06:34:19,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-11-18 06:34:35,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=102026.66666666667, ans=0.125 2023-11-18 06:34:37,264 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.501e+01 1.067e+02 1.209e+02 1.454e+02 2.188e+02, threshold=2.419e+02, percent-clipped=0.0 2023-11-18 06:34:44,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=102093.33333333333, ans=0.125 2023-11-18 06:34:50,085 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3300, loss[loss=0.1326, simple_loss=0.1354, pruned_loss=0.05434, audio_tagging_loss=0.01055, over 15304.00 frames. ], tot_loss[loss=0.1373, simple_loss=0.1418, pruned_loss=0.05331, audio_tagging_loss=0.01311, over 3058125.07 frames. ], batch size: 60, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:35:00,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2023-11-18 06:35:17,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=102293.33333333333, ans=0.125 2023-11-18 06:35:25,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.57 vs. limit=22.5 2023-11-18 06:35:41,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=102426.66666666667, ans=0.05 2023-11-18 06:35:43,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=102426.66666666667, ans=0.0 2023-11-18 06:35:46,861 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3350, loss[loss=0.1396, simple_loss=0.1452, pruned_loss=0.05302, audio_tagging_loss=0.01404, over 16839.00 frames. ], tot_loss[loss=0.1367, simple_loss=0.1416, pruned_loss=0.05301, audio_tagging_loss=0.0129, over 3060236.77 frames. ], batch size: 65, lr: 2.92e-02, grad_scale: 32.0 2023-11-18 06:36:15,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=102626.66666666667, ans=0.0 2023-11-18 06:36:30,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.335e+01 1.052e+02 1.183e+02 1.313e+02 1.850e+02, threshold=2.366e+02, percent-clipped=0.0 2023-11-18 06:36:31,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-18 06:36:42,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=102760.0, ans=0.125 2023-11-18 06:36:44,255 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3400, loss[loss=0.1227, simple_loss=0.1316, pruned_loss=0.04052, audio_tagging_loss=0.01641, over 14659.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.1417, pruned_loss=0.05302, audio_tagging_loss=0.01274, over 3068552.80 frames. ], batch size: 53, lr: 2.92e-02, grad_scale: 32.0 2023-11-18 06:37:05,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=102960.0, ans=0.09899494936611666 2023-11-18 06:37:11,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=102960.0, ans=0.0 2023-11-18 06:37:24,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=103026.66666666667, ans=0.2 2023-11-18 06:37:36,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=103093.33333333333, ans=0.125 2023-11-18 06:37:39,600 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3450, loss[loss=0.1359, simple_loss=0.1467, pruned_loss=0.05137, audio_tagging_loss=0.01123, over 14219.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1414, pruned_loss=0.05271, audio_tagging_loss=0.01262, over 3062745.79 frames. ], batch size: 55, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:37:44,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=103160.0, ans=0.125 2023-11-18 06:37:48,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=103160.0, ans=0.125 2023-11-18 06:37:51,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=103226.66666666667, ans=0.07 2023-11-18 06:38:02,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=103293.33333333333, ans=0.0 2023-11-18 06:38:21,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.158e+01 1.088e+02 1.277e+02 1.401e+02 2.193e+02, threshold=2.554e+02, percent-clipped=0.0 2023-11-18 06:38:35,893 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3500, loss[loss=0.109, simple_loss=0.1151, pruned_loss=0.03783, audio_tagging_loss=0.01358, over 15280.00 frames. ], tot_loss[loss=0.135, simple_loss=0.1405, pruned_loss=0.05217, audio_tagging_loss=0.01255, over 3056527.74 frames. ], batch size: 58, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:38:38,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=103493.33333333333, ans=0.0 2023-11-18 06:38:58,578 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.211e+00 2023-11-18 06:39:02,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=103626.66666666667, ans=0.1 2023-11-18 06:39:03,745 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:39:09,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=103693.33333333333, ans=0.125 2023-11-18 06:39:17,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=103693.33333333333, ans=0.1 2023-11-18 06:39:27,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=103760.0, ans=0.125 2023-11-18 06:39:32,482 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3550, loss[loss=0.1728, simple_loss=0.1841, pruned_loss=0.07054, audio_tagging_loss=0.01017, over 15390.00 frames. ], tot_loss[loss=0.1331, simple_loss=0.1386, pruned_loss=0.05122, audio_tagging_loss=0.01265, over 3047487.08 frames. ], batch size: 55, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:39:58,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=103960.0, ans=0.0 2023-11-18 06:40:02,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=103960.0, ans=0.0 2023-11-18 06:40:11,038 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.37 vs. limit=10.0 2023-11-18 06:40:12,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=104026.66666666667, ans=0.05 2023-11-18 06:40:15,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 9.988e+01 1.160e+02 1.284e+02 2.391e+02, threshold=2.320e+02, percent-clipped=0.0 2023-11-18 06:40:28,310 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3600, loss[loss=0.1398, simple_loss=0.143, pruned_loss=0.05555, audio_tagging_loss=0.01271, over 15094.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.1381, pruned_loss=0.05077, audio_tagging_loss=0.01267, over 3047028.49 frames. ], batch size: 57, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:40:47,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-18 06:40:55,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=104293.33333333333, ans=0.125 2023-11-18 06:41:04,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=104360.0, ans=0.125 2023-11-18 06:41:06,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=104360.0, ans=0.04949747468305833 2023-11-18 06:41:24,472 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3650, loss[loss=0.1369, simple_loss=0.1503, pruned_loss=0.05074, audio_tagging_loss=0.011, over 15585.00 frames. ], tot_loss[loss=0.1328, simple_loss=0.1383, pruned_loss=0.05093, audio_tagging_loss=0.0127, over 3049360.06 frames. ], batch size: 56, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:41:54,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=104626.66666666667, ans=0.125 2023-11-18 06:42:07,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.564e+01 1.051e+02 1.152e+02 1.363e+02 2.191e+02, threshold=2.304e+02, percent-clipped=0.0 2023-11-18 06:42:09,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104760.0, ans=0.1 2023-11-18 06:42:14,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2023-11-18 06:42:18,885 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:42:19,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=104760.0, ans=0.125 2023-11-18 06:42:20,860 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3700, loss[loss=0.1668, simple_loss=0.1809, pruned_loss=0.06648, audio_tagging_loss=0.009918, over 15052.00 frames. ], tot_loss[loss=0.1328, simple_loss=0.1382, pruned_loss=0.05094, audio_tagging_loss=0.01278, over 3053723.41 frames. ], batch size: 56, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:42:26,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=104826.66666666667, ans=0.1 2023-11-18 06:42:40,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=104893.33333333333, ans=0.2 2023-11-18 06:43:17,364 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3750, loss[loss=0.1548, simple_loss=0.1678, pruned_loss=0.05931, audio_tagging_loss=0.0116, over 15824.00 frames. ], tot_loss[loss=0.1335, simple_loss=0.1389, pruned_loss=0.05131, audio_tagging_loss=0.01275, over 3055568.71 frames. ], batch size: 58, lr: 2.89e-02, grad_scale: 32.0 2023-11-18 06:43:47,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=105293.33333333333, ans=0.125 2023-11-18 06:43:56,413 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:44:00,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.685e+01 1.091e+02 1.248e+02 1.454e+02 2.022e+02, threshold=2.495e+02, percent-clipped=0.0 2023-11-18 06:44:08,846 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-11-18 06:44:12,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=105426.66666666667, ans=0.2 2023-11-18 06:44:14,161 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3800, loss[loss=0.1004, simple_loss=0.1051, pruned_loss=0.03507, audio_tagging_loss=0.01278, over 15234.00 frames. ], tot_loss[loss=0.1323, simple_loss=0.1373, pruned_loss=0.05072, audio_tagging_loss=0.01291, over 3058895.07 frames. ], batch size: 57, lr: 2.89e-02, grad_scale: 32.0 2023-11-18 06:44:38,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105626.66666666667, ans=0.1 2023-11-18 06:44:41,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=105626.66666666667, ans=0.125 2023-11-18 06:44:42,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.26 vs. limit=15.0 2023-11-18 06:44:44,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=105626.66666666667, ans=0.125 2023-11-18 06:44:48,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-18 06:44:49,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.14 vs. limit=22.5 2023-11-18 06:45:10,938 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3850, loss[loss=0.1871, simple_loss=0.2052, pruned_loss=0.07393, audio_tagging_loss=0.01058, over 15750.00 frames. ], tot_loss[loss=0.1327, simple_loss=0.1377, pruned_loss=0.0509, audio_tagging_loss=0.01292, over 3061059.93 frames. ], batch size: 55, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:45:18,102 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-18 06:45:24,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=105893.33333333333, ans=0.125 2023-11-18 06:45:24,484 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.05 vs. limit=15.0 2023-11-18 06:45:42,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=105960.0, ans=0.125 2023-11-18 06:45:53,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 1.026e+02 1.153e+02 1.299e+02 2.070e+02, threshold=2.305e+02, percent-clipped=0.0 2023-11-18 06:45:57,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=106093.33333333333, ans=0.1 2023-11-18 06:46:05,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=106160.0, ans=0.0 2023-11-18 06:46:06,607 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3900, loss[loss=0.1325, simple_loss=0.138, pruned_loss=0.05348, audio_tagging_loss=0.01008, over 15217.00 frames. ], tot_loss[loss=0.1331, simple_loss=0.138, pruned_loss=0.05111, audio_tagging_loss=0.01303, over 3056358.48 frames. ], batch size: 57, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:46:17,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=106226.66666666667, ans=15.0 2023-11-18 06:46:21,734 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:46:38,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=106293.33333333333, ans=0.125 2023-11-18 06:46:56,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=106426.66666666667, ans=0.07 2023-11-18 06:47:03,428 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 3950, loss[loss=0.1701, simple_loss=0.1885, pruned_loss=0.06646, audio_tagging_loss=0.009349, over 15910.00 frames. ], tot_loss[loss=0.1336, simple_loss=0.1383, pruned_loss=0.05128, audio_tagging_loss=0.01319, over 3058118.94 frames. ], batch size: 56, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:47:03,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2023-11-18 06:47:12,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106493.33333333333, ans=0.1 2023-11-18 06:47:12,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=106493.33333333333, ans=0.125 2023-11-18 06:47:18,781 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.243e+00 2023-11-18 06:47:48,874 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 1.025e+02 1.127e+02 1.249e+02 1.832e+02, threshold=2.254e+02, percent-clipped=0.0 2023-11-18 06:47:53,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=15.0 2023-11-18 06:48:02,293 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4000, loss[loss=0.1728, simple_loss=0.1861, pruned_loss=0.06861, audio_tagging_loss=0.0111, over 15844.00 frames. ], tot_loss[loss=0.1335, simple_loss=0.1382, pruned_loss=0.05117, audio_tagging_loss=0.01325, over 3056280.12 frames. ], batch size: 57, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:48:11,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=106826.66666666667, ans=0.125 2023-11-18 06:48:12,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=106893.33333333333, ans=0.125 2023-11-18 06:48:16,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=106893.33333333333, ans=0.125 2023-11-18 06:48:32,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=106960.0, ans=0.2 2023-11-18 06:48:35,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107026.66666666667, ans=0.1 2023-11-18 06:48:37,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=12.0 2023-11-18 06:48:58,676 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4050, loss[loss=0.1375, simple_loss=0.1427, pruned_loss=0.0535, audio_tagging_loss=0.01259, over 14471.00 frames. ], tot_loss[loss=0.1346, simple_loss=0.1393, pruned_loss=0.0517, audio_tagging_loss=0.01323, over 3061982.86 frames. ], batch size: 55, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:48:59,817 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:49:04,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=107160.0, ans=0.0 2023-11-18 06:49:07,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=107160.0, ans=0.125 2023-11-18 06:49:13,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=107226.66666666667, ans=0.125 2023-11-18 06:49:25,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.79 vs. limit=10.0 2023-11-18 06:49:32,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2023-11-18 06:49:33,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=107360.0, ans=0.0 2023-11-18 06:49:41,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 1.077e+02 1.199e+02 1.331e+02 2.496e+02, threshold=2.397e+02, percent-clipped=1.0 2023-11-18 06:49:41,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=107360.0, ans=0.0 2023-11-18 06:49:55,684 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4100, loss[loss=0.1728, simple_loss=0.1876, pruned_loss=0.06762, audio_tagging_loss=0.01139, over 16163.00 frames. ], tot_loss[loss=0.1357, simple_loss=0.1407, pruned_loss=0.05224, audio_tagging_loss=0.0131, over 3056150.29 frames. ], batch size: 59, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:50:06,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=107560.0, ans=0.125 2023-11-18 06:50:07,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=107560.0, ans=0.0 2023-11-18 06:50:26,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=12.0 2023-11-18 06:50:31,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2023-11-18 06:50:39,263 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-11-18 06:50:51,924 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4150, loss[loss=0.1274, simple_loss=0.1405, pruned_loss=0.04722, audio_tagging_loss=0.00994, over 15221.00 frames. ], tot_loss[loss=0.135, simple_loss=0.1404, pruned_loss=0.0519, audio_tagging_loss=0.01295, over 3052506.49 frames. ], batch size: 56, lr: 2.86e-02, grad_scale: 32.0 2023-11-18 06:50:52,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=107826.66666666667, ans=0.125 2023-11-18 06:51:05,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2023-11-18 06:51:06,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=107893.33333333333, ans=0.0 2023-11-18 06:51:31,967 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:51:35,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.314e+01 1.032e+02 1.149e+02 1.336e+02 2.371e+02, threshold=2.297e+02, percent-clipped=0.0 2023-11-18 06:51:40,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=108093.33333333333, ans=0.1 2023-11-18 06:51:40,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=108093.33333333333, ans=0.2 2023-11-18 06:51:43,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=108093.33333333333, ans=0.2 2023-11-18 06:51:48,694 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4200, loss[loss=0.1034, simple_loss=0.1036, pruned_loss=0.03679, audio_tagging_loss=0.01481, over 13808.00 frames. ], tot_loss[loss=0.1336, simple_loss=0.1391, pruned_loss=0.05124, audio_tagging_loss=0.01283, over 3052188.95 frames. ], batch size: 54, lr: 2.86e-02, grad_scale: 32.0 2023-11-18 06:52:00,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-18 06:52:01,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=108226.66666666667, ans=0.2 2023-11-18 06:52:12,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=108293.33333333333, ans=0.0 2023-11-18 06:52:13,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=108293.33333333333, ans=0.09899494936611666 2023-11-18 06:52:27,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=12.0 2023-11-18 06:52:33,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=108426.66666666667, ans=0.2 2023-11-18 06:52:44,349 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4250, loss[loss=0.1307, simple_loss=0.1295, pruned_loss=0.05428, audio_tagging_loss=0.01164, over 14572.00 frames. ], tot_loss[loss=0.1323, simple_loss=0.138, pruned_loss=0.05055, audio_tagging_loss=0.01273, over 3048685.78 frames. ], batch size: 55, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:52:46,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=108493.33333333333, ans=0.0 2023-11-18 06:53:11,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=108626.66666666667, ans=0.1 2023-11-18 06:53:11,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=108626.66666666667, ans=0.125 2023-11-18 06:53:26,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 9.019e+01 1.076e+02 1.189e+02 1.301e+02 1.957e+02, threshold=2.378e+02, percent-clipped=0.0 2023-11-18 06:53:41,497 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4300, loss[loss=0.1104, simple_loss=0.1224, pruned_loss=0.04009, audio_tagging_loss=0.009137, over 16124.00 frames. ], tot_loss[loss=0.1322, simple_loss=0.1378, pruned_loss=0.05058, audio_tagging_loss=0.01271, over 3043065.02 frames. ], batch size: 60, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:53:44,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=108826.66666666667, ans=0.125 2023-11-18 06:54:11,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=108960.0, ans=0.0 2023-11-18 06:54:16,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=109026.66666666667, ans=0.125 2023-11-18 06:54:32,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-11-18 06:54:36,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=109160.0, ans=0.95 2023-11-18 06:54:37,458 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4350, loss[loss=0.1388, simple_loss=0.1422, pruned_loss=0.05219, audio_tagging_loss=0.01557, over 15205.00 frames. ], tot_loss[loss=0.1314, simple_loss=0.137, pruned_loss=0.05031, audio_tagging_loss=0.01259, over 3038932.61 frames. ], batch size: 56, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:54:43,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=109160.0, ans=0.125 2023-11-18 06:54:49,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=109226.66666666667, ans=0.2 2023-11-18 06:54:51,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=109226.66666666667, ans=0.0 2023-11-18 06:54:57,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=109226.66666666667, ans=0.125 2023-11-18 06:55:01,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.45 vs. limit=15.0 2023-11-18 06:55:03,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.34 vs. limit=15.0 2023-11-18 06:55:04,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=109293.33333333333, ans=0.125 2023-11-18 06:55:19,774 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:55:20,482 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.140e+01 1.013e+02 1.155e+02 1.315e+02 2.105e+02, threshold=2.309e+02, percent-clipped=0.0 2023-11-18 06:55:29,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=109426.66666666667, ans=0.125 2023-11-18 06:55:33,442 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4400, loss[loss=0.1364, simple_loss=0.135, pruned_loss=0.05501, audio_tagging_loss=0.01393, over 14970.00 frames. ], tot_loss[loss=0.1316, simple_loss=0.1372, pruned_loss=0.05044, audio_tagging_loss=0.0126, over 3037464.04 frames. ], batch size: 54, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:55:42,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=109493.33333333333, ans=0.0 2023-11-18 06:55:55,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=109560.0, ans=0.125 2023-11-18 06:56:01,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109626.66666666667, ans=0.1 2023-11-18 06:56:07,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=109693.33333333333, ans=0.125 2023-11-18 06:56:16,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2023-11-18 06:56:29,199 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4450, loss[loss=0.109, simple_loss=0.1056, pruned_loss=0.04089, audio_tagging_loss=0.01533, over 13823.00 frames. ], tot_loss[loss=0.1322, simple_loss=0.1379, pruned_loss=0.05063, audio_tagging_loss=0.01265, over 3043560.56 frames. ], batch size: 54, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:56:56,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=109960.0, ans=0.0 2023-11-18 06:57:01,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=109960.0, ans=0.125 2023-11-18 06:57:09,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=110026.66666666667, ans=0.0 2023-11-18 06:57:11,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 1.110e+02 1.220e+02 1.457e+02 2.260e+02, threshold=2.440e+02, percent-clipped=0.0 2023-11-18 06:57:13,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=110093.33333333333, ans=0.0 2023-11-18 06:57:26,476 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4500, loss[loss=0.1626, simple_loss=0.1743, pruned_loss=0.06253, audio_tagging_loss=0.0129, over 15164.00 frames. ], tot_loss[loss=0.1314, simple_loss=0.1371, pruned_loss=0.05027, audio_tagging_loss=0.0126, over 3048128.92 frames. ], batch size: 58, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:57:27,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=110160.0, ans=0.02 2023-11-18 06:57:33,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=110160.0, ans=0.0 2023-11-18 06:57:50,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110293.33333333333, ans=0.1 2023-11-18 06:57:58,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=110360.0, ans=0.0 2023-11-18 06:58:02,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=110360.0, ans=0.0 2023-11-18 06:58:03,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.96 vs. limit=10.0 2023-11-18 06:58:05,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=110360.0, ans=0.2 2023-11-18 06:58:07,609 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:58:15,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=110426.66666666667, ans=0.125 2023-11-18 06:58:17,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=110426.66666666667, ans=0.125 2023-11-18 06:58:22,425 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4550, loss[loss=0.1478, simple_loss=0.1474, pruned_loss=0.05917, audio_tagging_loss=0.01492, over 14558.00 frames. ], tot_loss[loss=0.1324, simple_loss=0.1382, pruned_loss=0.05061, audio_tagging_loss=0.01267, over 3043392.38 frames. ], batch size: 55, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 06:58:22,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=110493.33333333333, ans=0.0 2023-11-18 06:58:59,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=110693.33333333333, ans=0.125 2023-11-18 06:59:05,467 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.009e+02 1.148e+02 1.280e+02 1.877e+02, threshold=2.295e+02, percent-clipped=0.0 2023-11-18 06:59:05,509 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:59:18,756 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4600, loss[loss=0.1299, simple_loss=0.1481, pruned_loss=0.04517, audio_tagging_loss=0.01068, over 14760.00 frames. ], tot_loss[loss=0.1319, simple_loss=0.1377, pruned_loss=0.05036, audio_tagging_loss=0.01271, over 3039921.71 frames. ], batch size: 56, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 06:59:27,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=110826.66666666667, ans=0.125 2023-11-18 06:59:47,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2023-11-18 06:59:58,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=111026.66666666667, ans=0.2 2023-11-18 07:00:03,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=111093.33333333333, ans=0.07 2023-11-18 07:00:06,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=111093.33333333333, ans=22.5 2023-11-18 07:00:08,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=111093.33333333333, ans=0.1 2023-11-18 07:00:15,440 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4650, loss[loss=0.1701, simple_loss=0.1788, pruned_loss=0.068, audio_tagging_loss=0.01271, over 15720.00 frames. ], tot_loss[loss=0.1316, simple_loss=0.1369, pruned_loss=0.05031, audio_tagging_loss=0.01279, over 3037390.13 frames. ], batch size: 58, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 07:00:18,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=111160.0, ans=0.0 2023-11-18 07:00:38,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=111293.33333333333, ans=0.025 2023-11-18 07:00:58,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 1.062e+02 1.161e+02 1.332e+02 2.161e+02, threshold=2.322e+02, percent-clipped=0.0 2023-11-18 07:01:07,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=111426.66666666667, ans=0.125 2023-11-18 07:01:10,900 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4700, loss[loss=0.1045, simple_loss=0.1013, pruned_loss=0.03796, audio_tagging_loss=0.01591, over 14823.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.1378, pruned_loss=0.05036, audio_tagging_loss=0.01287, over 3041554.75 frames. ], batch size: 59, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:01:21,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=111560.0, ans=0.0 2023-11-18 07:01:27,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=111560.0, ans=0.0 2023-11-18 07:01:33,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=111626.66666666667, ans=0.035 2023-11-18 07:01:42,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=111626.66666666667, ans=0.2 2023-11-18 07:01:53,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=111693.33333333333, ans=0.1 2023-11-18 07:01:54,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=111760.0, ans=0.125 2023-11-18 07:01:58,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-11-18 07:02:00,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=111760.0, ans=0.125 2023-11-18 07:02:06,893 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4750, loss[loss=0.14, simple_loss=0.1409, pruned_loss=0.0506, audio_tagging_loss=0.01894, over 14565.00 frames. ], tot_loss[loss=0.1319, simple_loss=0.1372, pruned_loss=0.05029, audio_tagging_loss=0.01306, over 3039011.64 frames. ], batch size: 57, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:02:10,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=111826.66666666667, ans=0.0 2023-11-18 07:02:11,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=111826.66666666667, ans=0.2 2023-11-18 07:02:11,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.59 vs. limit=22.5 2023-11-18 07:02:17,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2023-11-18 07:02:49,768 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.988e+01 1.063e+02 1.146e+02 1.305e+02 1.876e+02, threshold=2.292e+02, percent-clipped=0.0 2023-11-18 07:02:53,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=112093.33333333333, ans=0.125 2023-11-18 07:03:03,776 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4800, loss[loss=0.1439, simple_loss=0.1571, pruned_loss=0.05586, audio_tagging_loss=0.009529, over 15626.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.1379, pruned_loss=0.0504, audio_tagging_loss=0.01312, over 3041075.90 frames. ], batch size: 56, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:03:06,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112160.0, ans=0.1 2023-11-18 07:03:14,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112226.66666666667, ans=0.1 2023-11-18 07:03:18,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=15.0 2023-11-18 07:03:34,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=112293.33333333333, ans=0.0 2023-11-18 07:03:41,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=112360.0, ans=0.125 2023-11-18 07:03:56,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=112426.66666666667, ans=0.2 2023-11-18 07:03:59,944 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4850, loss[loss=0.1084, simple_loss=0.1182, pruned_loss=0.03887, audio_tagging_loss=0.01045, over 15390.00 frames. ], tot_loss[loss=0.1316, simple_loss=0.1365, pruned_loss=0.04999, audio_tagging_loss=0.01335, over 3038876.19 frames. ], batch size: 59, lr: 2.81e-02, grad_scale: 32.0 2023-11-18 07:04:15,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=112560.0, ans=0.125 2023-11-18 07:04:18,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=112560.0, ans=0.1 2023-11-18 07:04:30,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=112626.66666666667, ans=0.125 2023-11-18 07:04:32,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=112693.33333333333, ans=0.125 2023-11-18 07:04:42,629 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 1.047e+02 1.164e+02 1.344e+02 1.766e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 07:04:49,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.64 vs. limit=15.0 2023-11-18 07:04:56,042 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4900, loss[loss=0.1137, simple_loss=0.1298, pruned_loss=0.04005, audio_tagging_loss=0.008778, over 15812.00 frames. ], tot_loss[loss=0.1309, simple_loss=0.1363, pruned_loss=0.04955, audio_tagging_loss=0.01318, over 3041185.26 frames. ], batch size: 59, lr: 2.81e-02, grad_scale: 32.0 2023-11-18 07:05:04,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.26 vs. limit=22.5 2023-11-18 07:05:18,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=112960.0, ans=0.2 2023-11-18 07:05:20,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2023-11-18 07:05:23,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-11-18 07:05:37,011 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.73 vs. limit=10.0 2023-11-18 07:05:38,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=113026.66666666667, ans=0.125 2023-11-18 07:05:51,912 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 4950, loss[loss=0.1399, simple_loss=0.1533, pruned_loss=0.05049, audio_tagging_loss=0.01274, over 15827.00 frames. ], tot_loss[loss=0.1315, simple_loss=0.1369, pruned_loss=0.05002, audio_tagging_loss=0.01304, over 3045032.41 frames. ], batch size: 57, lr: 2.81e-02, grad_scale: 64.0 2023-11-18 07:06:00,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=113160.0, ans=0.05 2023-11-18 07:06:09,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=113226.66666666667, ans=0.0 2023-11-18 07:06:34,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.114e+01 1.049e+02 1.181e+02 1.339e+02 2.582e+02, threshold=2.362e+02, percent-clipped=1.0 2023-11-18 07:06:41,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=113426.66666666667, ans=0.125 2023-11-18 07:06:43,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=113426.66666666667, ans=0.0 2023-11-18 07:06:48,202 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5000, loss[loss=0.1248, simple_loss=0.1284, pruned_loss=0.04949, audio_tagging_loss=0.01107, over 15658.00 frames. ], tot_loss[loss=0.1313, simple_loss=0.137, pruned_loss=0.04994, audio_tagging_loss=0.01282, over 3041697.55 frames. ], batch size: 57, lr: 2.80e-02, grad_scale: 64.0 2023-11-18 07:07:00,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=113560.0, ans=0.2 2023-11-18 07:07:08,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=113560.0, ans=0.0 2023-11-18 07:07:10,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=113626.66666666667, ans=0.125 2023-11-18 07:07:20,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.38 vs. limit=22.5 2023-11-18 07:07:31,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=113693.33333333333, ans=0.125 2023-11-18 07:07:44,827 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5050, loss[loss=0.1326, simple_loss=0.1434, pruned_loss=0.05002, audio_tagging_loss=0.0109, over 15701.00 frames. ], tot_loss[loss=0.132, simple_loss=0.1383, pruned_loss=0.05019, audio_tagging_loss=0.01268, over 3043817.11 frames. ], batch size: 56, lr: 2.80e-02, grad_scale: 64.0 2023-11-18 07:08:25,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=114026.66666666667, ans=15.0 2023-11-18 07:08:27,547 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 1.016e+02 1.164e+02 1.342e+02 1.810e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 07:08:38,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.85 vs. limit=22.5 2023-11-18 07:08:41,050 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5100, loss[loss=0.1208, simple_loss=0.1246, pruned_loss=0.04456, audio_tagging_loss=0.01392, over 15673.00 frames. ], tot_loss[loss=0.1316, simple_loss=0.1377, pruned_loss=0.05005, audio_tagging_loss=0.01265, over 3037674.32 frames. ], batch size: 60, lr: 2.79e-02, grad_scale: 64.0 2023-11-18 07:08:46,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-11-18 07:08:49,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=15.0 2023-11-18 07:08:55,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=114226.66666666667, ans=0.0 2023-11-18 07:08:57,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=114226.66666666667, ans=0.125 2023-11-18 07:09:08,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=114293.33333333333, ans=0.0 2023-11-18 07:09:24,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=15.0 2023-11-18 07:09:30,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=114426.66666666667, ans=0.125 2023-11-18 07:09:37,383 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5150, loss[loss=0.147, simple_loss=0.1478, pruned_loss=0.06219, audio_tagging_loss=0.01091, over 15396.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.138, pruned_loss=0.05034, audio_tagging_loss=0.01271, over 3040119.41 frames. ], batch size: 59, lr: 2.79e-02, grad_scale: 16.0 2023-11-18 07:09:50,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=114560.0, ans=0.125 2023-11-18 07:10:02,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=114626.66666666667, ans=0.125 2023-11-18 07:10:07,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=114626.66666666667, ans=0.1 2023-11-18 07:10:08,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=114626.66666666667, ans=0.0 2023-11-18 07:10:10,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=114693.33333333333, ans=0.0 2023-11-18 07:10:17,609 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2023-11-18 07:10:22,535 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.209e+01 1.018e+02 1.157e+02 1.320e+02 3.492e+02, threshold=2.315e+02, percent-clipped=2.0 2023-11-18 07:10:23,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=114760.0, ans=0.125 2023-11-18 07:10:33,170 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.535e+00 2023-11-18 07:10:33,997 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5200, loss[loss=0.155, simple_loss=0.1639, pruned_loss=0.06296, audio_tagging_loss=0.01009, over 16007.00 frames. ], tot_loss[loss=0.1333, simple_loss=0.1395, pruned_loss=0.05089, audio_tagging_loss=0.01263, over 3041506.74 frames. ], batch size: 58, lr: 2.79e-02, grad_scale: 32.0 2023-11-18 07:10:40,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=114826.66666666667, ans=0.125 2023-11-18 07:10:55,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=114960.0, ans=0.1 2023-11-18 07:11:10,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=115026.66666666667, ans=0.1 2023-11-18 07:11:12,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=115026.66666666667, ans=0.1 2023-11-18 07:11:30,241 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5250, loss[loss=0.1151, simple_loss=0.127, pruned_loss=0.03833, audio_tagging_loss=0.01333, over 15138.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.1393, pruned_loss=0.05078, audio_tagging_loss=0.01253, over 3042920.85 frames. ], batch size: 56, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:11:35,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=115160.0, ans=0.0 2023-11-18 07:11:36,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=115160.0, ans=0.125 2023-11-18 07:12:13,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=115360.0, ans=0.0 2023-11-18 07:12:13,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2023-11-18 07:12:15,368 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.068e+01 1.019e+02 1.120e+02 1.285e+02 1.660e+02, threshold=2.240e+02, percent-clipped=0.0 2023-11-18 07:12:26,631 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5300, loss[loss=0.1012, simple_loss=0.1151, pruned_loss=0.03157, audio_tagging_loss=0.01209, over 15910.00 frames. ], tot_loss[loss=0.1327, simple_loss=0.139, pruned_loss=0.05071, audio_tagging_loss=0.01246, over 3048384.90 frames. ], batch size: 57, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:12:27,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=115493.33333333333, ans=0.0 2023-11-18 07:12:50,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115626.66666666667, ans=0.1 2023-11-18 07:12:51,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=1.92 vs. limit=15.0 2023-11-18 07:12:52,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=115626.66666666667, ans=22.5 2023-11-18 07:12:55,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115626.66666666667, ans=0.1 2023-11-18 07:12:57,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=115626.66666666667, ans=0.2 2023-11-18 07:12:57,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=115626.66666666667, ans=0.0 2023-11-18 07:13:15,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115760.0, ans=0.1 2023-11-18 07:13:22,360 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5350, loss[loss=0.1313, simple_loss=0.1339, pruned_loss=0.05405, audio_tagging_loss=0.01032, over 13795.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1406, pruned_loss=0.05132, audio_tagging_loss=0.01238, over 3046917.98 frames. ], batch size: 54, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:13:24,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=115826.66666666667, ans=0.125 2023-11-18 07:13:30,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-11-18 07:13:45,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.95 vs. limit=15.0 2023-11-18 07:13:54,471 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-11-18 07:13:56,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=116026.66666666667, ans=0.0 2023-11-18 07:14:02,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=116026.66666666667, ans=0.0 2023-11-18 07:14:04,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=116026.66666666667, ans=10.0 2023-11-18 07:14:08,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.439e+01 1.052e+02 1.204e+02 1.359e+02 2.060e+02, threshold=2.407e+02, percent-clipped=0.0 2023-11-18 07:14:20,306 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5400, loss[loss=0.1606, simple_loss=0.1689, pruned_loss=0.06199, audio_tagging_loss=0.01423, over 15292.00 frames. ], tot_loss[loss=0.135, simple_loss=0.1415, pruned_loss=0.05174, audio_tagging_loss=0.01247, over 3052805.53 frames. ], batch size: 56, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:14:25,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116160.0, ans=0.1 2023-11-18 07:14:27,567 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:14:33,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=116226.66666666667, ans=0.1 2023-11-18 07:14:59,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.16 vs. limit=22.5 2023-11-18 07:15:07,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=116426.66666666667, ans=0.125 2023-11-18 07:15:14,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=116426.66666666667, ans=0.125 2023-11-18 07:15:16,454 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5450, loss[loss=0.1169, simple_loss=0.1237, pruned_loss=0.04453, audio_tagging_loss=0.0105, over 14804.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.1401, pruned_loss=0.05106, audio_tagging_loss=0.01261, over 3049198.04 frames. ], batch size: 57, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:15:32,124 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-18 07:15:55,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2023-11-18 07:15:59,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116693.33333333333, ans=0.1 2023-11-18 07:15:59,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=116693.33333333333, ans=0.125 2023-11-18 07:16:01,099 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2023-11-18 07:16:01,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.036e+01 1.012e+02 1.167e+02 1.341e+02 1.969e+02, threshold=2.335e+02, percent-clipped=0.0 2023-11-18 07:16:01,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=116760.0, ans=0.125 2023-11-18 07:16:03,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=116760.0, ans=15.0 2023-11-18 07:16:06,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-11-18 07:16:08,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=116760.0, ans=0.0 2023-11-18 07:16:10,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=116760.0, ans=0.125 2023-11-18 07:16:12,360 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5500, loss[loss=0.1261, simple_loss=0.1331, pruned_loss=0.0464, audio_tagging_loss=0.01312, over 14661.00 frames. ], tot_loss[loss=0.1334, simple_loss=0.1397, pruned_loss=0.05086, audio_tagging_loss=0.01269, over 3046985.53 frames. ], batch size: 53, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:16:12,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=116826.66666666667, ans=0.125 2023-11-18 07:16:23,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=116893.33333333333, ans=0.2 2023-11-18 07:16:46,566 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2023-11-18 07:16:47,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.19 vs. limit=10.0 2023-11-18 07:17:08,012 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5550, loss[loss=0.09839, simple_loss=0.1104, pruned_loss=0.03068, audio_tagging_loss=0.01253, over 15595.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.1398, pruned_loss=0.05095, audio_tagging_loss=0.01285, over 3038310.16 frames. ], batch size: 58, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:17:09,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=117160.0, ans=0.0 2023-11-18 07:17:19,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=117226.66666666667, ans=0.1 2023-11-18 07:17:20,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=117226.66666666667, ans=0.2 2023-11-18 07:17:26,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=117226.66666666667, ans=0.125 2023-11-18 07:17:27,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=117226.66666666667, ans=0.07 2023-11-18 07:17:48,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=117360.0, ans=0.5 2023-11-18 07:17:53,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 1.048e+02 1.161e+02 1.291e+02 1.886e+02, threshold=2.323e+02, percent-clipped=0.0 2023-11-18 07:18:05,494 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5600, loss[loss=0.1599, simple_loss=0.1738, pruned_loss=0.06137, audio_tagging_loss=0.01162, over 15402.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.1399, pruned_loss=0.05085, audio_tagging_loss=0.01291, over 3043581.06 frames. ], batch size: 57, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:18:05,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=117493.33333333333, ans=0.1 2023-11-18 07:18:08,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=117493.33333333333, ans=0.0 2023-11-18 07:18:10,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=117493.33333333333, ans=0.0 2023-11-18 07:18:29,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=117626.66666666667, ans=0.0 2023-11-18 07:18:37,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.21 vs. limit=22.5 2023-11-18 07:18:38,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.98 vs. limit=22.5 2023-11-18 07:18:45,156 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:18:53,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=117760.0, ans=0.0 2023-11-18 07:18:58,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=117760.0, ans=0.125 2023-11-18 07:19:01,201 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5650, loss[loss=0.1193, simple_loss=0.1164, pruned_loss=0.04238, audio_tagging_loss=0.01868, over 14498.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.1388, pruned_loss=0.0501, audio_tagging_loss=0.01306, over 3041874.90 frames. ], batch size: 57, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:19:19,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=117893.33333333333, ans=0.0 2023-11-18 07:19:33,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=117960.0, ans=0.0 2023-11-18 07:19:34,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=118026.66666666667, ans=0.0 2023-11-18 07:19:36,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2023-11-18 07:19:46,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 1.030e+02 1.132e+02 1.306e+02 2.340e+02, threshold=2.264e+02, percent-clipped=1.0 2023-11-18 07:19:57,379 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5700, loss[loss=0.1146, simple_loss=0.1208, pruned_loss=0.04077, audio_tagging_loss=0.01341, over 14712.00 frames. ], tot_loss[loss=0.1314, simple_loss=0.1375, pruned_loss=0.04964, audio_tagging_loss=0.01299, over 3035925.76 frames. ], batch size: 57, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:20:03,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=118160.0, ans=0.125 2023-11-18 07:20:03,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.46 vs. limit=15.0 2023-11-18 07:20:13,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=118226.66666666667, ans=0.125 2023-11-18 07:20:16,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=118226.66666666667, ans=0.125 2023-11-18 07:20:20,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.20 vs. limit=22.5 2023-11-18 07:20:25,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2023-11-18 07:20:28,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-11-18 07:20:30,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2023-11-18 07:20:45,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=118426.66666666667, ans=0.0 2023-11-18 07:20:53,848 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5750, loss[loss=0.1374, simple_loss=0.1457, pruned_loss=0.05293, audio_tagging_loss=0.01161, over 15261.00 frames. ], tot_loss[loss=0.1299, simple_loss=0.1359, pruned_loss=0.04917, audio_tagging_loss=0.0128, over 3042367.52 frames. ], batch size: 53, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:20:54,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=118493.33333333333, ans=0.1 2023-11-18 07:21:08,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118560.0, ans=0.1 2023-11-18 07:21:29,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=118693.33333333333, ans=0.2 2023-11-18 07:21:33,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.63 vs. limit=22.5 2023-11-18 07:21:36,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=118693.33333333333, ans=0.125 2023-11-18 07:21:38,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 1.021e+02 1.146e+02 1.318e+02 2.072e+02, threshold=2.291e+02, percent-clipped=0.0 2023-11-18 07:21:49,062 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5800, loss[loss=0.1016, simple_loss=0.1069, pruned_loss=0.03544, audio_tagging_loss=0.01267, over 14858.00 frames. ], tot_loss[loss=0.1296, simple_loss=0.1356, pruned_loss=0.04908, audio_tagging_loss=0.01273, over 3039660.30 frames. ], batch size: 57, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:22:09,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=118893.33333333333, ans=0.2 2023-11-18 07:22:19,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-11-18 07:22:20,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=118960.0, ans=0.0 2023-11-18 07:22:28,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=119026.66666666667, ans=0.035 2023-11-18 07:22:34,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=119093.33333333333, ans=0.125 2023-11-18 07:22:36,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=119093.33333333333, ans=0.0 2023-11-18 07:22:41,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.01 vs. limit=22.5 2023-11-18 07:22:41,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=119093.33333333333, ans=0.2 2023-11-18 07:22:44,817 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5850, loss[loss=0.1069, simple_loss=0.1073, pruned_loss=0.03397, audio_tagging_loss=0.01924, over 16299.00 frames. ], tot_loss[loss=0.1303, simple_loss=0.1367, pruned_loss=0.04939, audio_tagging_loss=0.01256, over 3036588.39 frames. ], batch size: 63, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:23:08,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=119293.33333333333, ans=0.125 2023-11-18 07:23:20,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=119360.0, ans=0.0 2023-11-18 07:23:25,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2023-11-18 07:23:29,154 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.248e+01 1.029e+02 1.172e+02 1.323e+02 1.755e+02, threshold=2.344e+02, percent-clipped=0.0 2023-11-18 07:23:32,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=119426.66666666667, ans=0.125 2023-11-18 07:23:40,986 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5900, loss[loss=0.141, simple_loss=0.1511, pruned_loss=0.05296, audio_tagging_loss=0.01245, over 16521.00 frames. ], tot_loss[loss=0.1299, simple_loss=0.1362, pruned_loss=0.04932, audio_tagging_loss=0.01248, over 3034831.52 frames. ], batch size: 60, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:24:36,661 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 5950, loss[loss=0.1459, simple_loss=0.1645, pruned_loss=0.05447, audio_tagging_loss=0.009206, over 14659.00 frames. ], tot_loss[loss=0.1306, simple_loss=0.1371, pruned_loss=0.04957, audio_tagging_loss=0.0125, over 3046147.15 frames. ], batch size: 56, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:24:54,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=119893.33333333333, ans=0.0 2023-11-18 07:25:08,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=119960.0, ans=0.0 2023-11-18 07:25:15,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=120026.66666666667, ans=0.125 2023-11-18 07:25:21,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.274e+01 1.016e+02 1.169e+02 1.321e+02 1.949e+02, threshold=2.338e+02, percent-clipped=0.0 2023-11-18 07:25:32,023 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6000, loss[loss=0.1399, simple_loss=0.1427, pruned_loss=0.05605, audio_tagging_loss=0.01253, over 15612.00 frames. ], tot_loss[loss=0.1323, simple_loss=0.139, pruned_loss=0.0504, audio_tagging_loss=0.01238, over 3050008.76 frames. ], batch size: 56, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:25:32,024 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 07:26:04,337 INFO [train_asr.py:1147] (3/4) Epoch 2, validation: loss=0.08772, simple_loss=0.06916, pruned_loss=0.01519, audio_tagging_loss=0.03794, over 4681554.00 frames. 2023-11-18 07:26:04,337 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 07:26:10,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120160.0, ans=0.1 2023-11-18 07:26:27,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=120293.33333333333, ans=0.125 2023-11-18 07:26:34,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120293.33333333333, ans=0.1 2023-11-18 07:26:44,881 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:26:56,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=120426.66666666667, ans=0.0 2023-11-18 07:26:56,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=120426.66666666667, ans=0.125 2023-11-18 07:27:00,577 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6050, loss[loss=0.1334, simple_loss=0.1222, pruned_loss=0.05378, audio_tagging_loss=0.01859, over 16185.00 frames. ], tot_loss[loss=0.1315, simple_loss=0.138, pruned_loss=0.04996, audio_tagging_loss=0.01252, over 3049261.80 frames. ], batch size: 60, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:27:13,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=120560.0, ans=0.125 2023-11-18 07:27:15,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120560.0, ans=0.1 2023-11-18 07:27:17,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=120560.0, ans=22.5 2023-11-18 07:27:27,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=120626.66666666667, ans=0.125 2023-11-18 07:27:46,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.607e+01 1.101e+02 1.234e+02 1.349e+02 2.388e+02, threshold=2.468e+02, percent-clipped=1.0 2023-11-18 07:27:53,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=120760.0, ans=0.125 2023-11-18 07:27:53,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.49 vs. limit=22.5 2023-11-18 07:27:57,582 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6100, loss[loss=0.1491, simple_loss=0.1578, pruned_loss=0.05865, audio_tagging_loss=0.01153, over 15703.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.1364, pruned_loss=0.04954, audio_tagging_loss=0.01264, over 3049989.12 frames. ], batch size: 58, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:27:57,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120826.66666666667, ans=0.1 2023-11-18 07:28:15,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.57 vs. limit=22.5 2023-11-18 07:28:21,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=120960.0, ans=0.125 2023-11-18 07:28:34,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.57 vs. limit=22.5 2023-11-18 07:28:40,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=121026.66666666667, ans=0.125 2023-11-18 07:28:46,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121093.33333333333, ans=0.0 2023-11-18 07:28:48,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=121093.33333333333, ans=0.125 2023-11-18 07:28:54,898 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6150, loss[loss=0.09702, simple_loss=0.09694, pruned_loss=0.03369, audio_tagging_loss=0.01486, over 15745.00 frames. ], tot_loss[loss=0.1305, simple_loss=0.1364, pruned_loss=0.04958, audio_tagging_loss=0.01275, over 3047034.07 frames. ], batch size: 59, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:29:08,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121226.66666666667, ans=0.1 2023-11-18 07:29:08,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=12.0 2023-11-18 07:29:09,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=121226.66666666667, ans=0.125 2023-11-18 07:29:11,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=121226.66666666667, ans=0.2 2023-11-18 07:29:20,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.48 vs. limit=22.5 2023-11-18 07:29:40,651 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 1.065e+02 1.218e+02 1.371e+02 2.442e+02, threshold=2.436e+02, percent-clipped=0.0 2023-11-18 07:29:52,066 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6200, loss[loss=0.1494, simple_loss=0.1544, pruned_loss=0.05969, audio_tagging_loss=0.01246, over 15483.00 frames. ], tot_loss[loss=0.129, simple_loss=0.1347, pruned_loss=0.04881, audio_tagging_loss=0.01284, over 3043333.74 frames. ], batch size: 56, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:30:02,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=121560.0, ans=0.2 2023-11-18 07:30:04,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=121560.0, ans=10.0 2023-11-18 07:30:14,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=121626.66666666667, ans=0.0 2023-11-18 07:30:15,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.17 vs. limit=15.0 2023-11-18 07:30:19,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=121626.66666666667, ans=0.0 2023-11-18 07:30:22,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=121626.66666666667, ans=0.025 2023-11-18 07:30:48,807 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6250, loss[loss=0.1308, simple_loss=0.1462, pruned_loss=0.04656, audio_tagging_loss=0.01116, over 14875.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1337, pruned_loss=0.04826, audio_tagging_loss=0.01287, over 3044431.67 frames. ], batch size: 56, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:30:49,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=121826.66666666667, ans=0.0 2023-11-18 07:30:51,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=121826.66666666667, ans=0.0 2023-11-18 07:30:57,277 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2023-11-18 07:31:12,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=121960.0, ans=0.1 2023-11-18 07:31:19,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=121960.0, ans=0.125 2023-11-18 07:31:33,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 1.000e+02 1.109e+02 1.237e+02 1.670e+02, threshold=2.218e+02, percent-clipped=0.0 2023-11-18 07:31:34,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2023-11-18 07:31:40,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=122093.33333333333, ans=0.0 2023-11-18 07:31:45,282 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6300, loss[loss=0.08858, simple_loss=0.07837, pruned_loss=0.03084, audio_tagging_loss=0.01855, over 17759.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1332, pruned_loss=0.04817, audio_tagging_loss=0.01313, over 3052960.36 frames. ], batch size: 71, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:31:49,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=122160.0, ans=0.125 2023-11-18 07:32:01,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=122226.66666666667, ans=0.125 2023-11-18 07:32:06,231 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.58 vs. limit=22.5 2023-11-18 07:32:27,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=122360.0, ans=0.025 2023-11-18 07:32:42,049 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6350, loss[loss=0.1514, simple_loss=0.1542, pruned_loss=0.06619, audio_tagging_loss=0.008088, over 14494.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1326, pruned_loss=0.04803, audio_tagging_loss=0.01314, over 3051267.80 frames. ], batch size: 55, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:32:52,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=122560.0, ans=0.125 2023-11-18 07:33:01,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=122560.0, ans=0.0 2023-11-18 07:33:03,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.11 vs. limit=6.0 2023-11-18 07:33:23,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122693.33333333333, ans=0.1 2023-11-18 07:33:23,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=122693.33333333333, ans=0.1 2023-11-18 07:33:27,680 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.961e+01 1.030e+02 1.146e+02 1.327e+02 2.114e+02, threshold=2.291e+02, percent-clipped=0.0 2023-11-18 07:33:30,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=122760.0, ans=0.0 2023-11-18 07:33:30,113 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:33:36,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=122760.0, ans=0.0 2023-11-18 07:33:39,606 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6400, loss[loss=0.1167, simple_loss=0.1196, pruned_loss=0.04363, audio_tagging_loss=0.01322, over 14655.00 frames. ], tot_loss[loss=0.1285, simple_loss=0.1338, pruned_loss=0.04844, audio_tagging_loss=0.01313, over 3048321.83 frames. ], batch size: 55, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:33:43,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=122826.66666666667, ans=0.125 2023-11-18 07:33:56,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=122893.33333333333, ans=0.0 2023-11-18 07:34:14,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123026.66666666667, ans=0.1 2023-11-18 07:34:15,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=123026.66666666667, ans=0.125 2023-11-18 07:34:32,503 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:34:35,470 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6450, loss[loss=0.08572, simple_loss=0.08348, pruned_loss=0.02985, audio_tagging_loss=0.01414, over 15464.00 frames. ], tot_loss[loss=0.1292, simple_loss=0.1348, pruned_loss=0.04872, audio_tagging_loss=0.0131, over 3050086.87 frames. ], batch size: 58, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:34:36,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123160.0, ans=0.1 2023-11-18 07:34:38,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-11-18 07:34:55,222 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.93 vs. limit=6.0 2023-11-18 07:35:10,699 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:35:11,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=123360.0, ans=0.1 2023-11-18 07:35:15,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-18 07:35:20,676 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 1.021e+02 1.177e+02 1.311e+02 2.345e+02, threshold=2.354e+02, percent-clipped=1.0 2023-11-18 07:35:31,883 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6500, loss[loss=0.1588, simple_loss=0.1673, pruned_loss=0.06279, audio_tagging_loss=0.01241, over 14163.00 frames. ], tot_loss[loss=0.1298, simple_loss=0.1353, pruned_loss=0.04909, audio_tagging_loss=0.01302, over 3052072.32 frames. ], batch size: 54, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:35:35,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=123493.33333333333, ans=0.0 2023-11-18 07:35:57,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=123626.66666666667, ans=0.0 2023-11-18 07:36:23,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=123760.0, ans=0.0 2023-11-18 07:36:28,332 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6550, loss[loss=0.1399, simple_loss=0.1388, pruned_loss=0.05825, audio_tagging_loss=0.01228, over 15248.00 frames. ], tot_loss[loss=0.1298, simple_loss=0.1356, pruned_loss=0.04919, audio_tagging_loss=0.01278, over 3049575.34 frames. ], batch size: 56, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:36:29,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2023-11-18 07:36:58,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=123960.0, ans=0.5 2023-11-18 07:37:01,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=124026.66666666667, ans=0.125 2023-11-18 07:37:13,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.989e+01 1.139e+02 1.347e+02 1.768e+02, threshold=2.277e+02, percent-clipped=0.0 2023-11-18 07:37:25,653 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6600, loss[loss=0.1471, simple_loss=0.1525, pruned_loss=0.05681, audio_tagging_loss=0.01405, over 15142.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1336, pruned_loss=0.04836, audio_tagging_loss=0.01277, over 3048752.89 frames. ], batch size: 56, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:37:34,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124160.0, ans=0.125 2023-11-18 07:37:58,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=124360.0, ans=0.125 2023-11-18 07:38:01,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.75 vs. limit=22.5 2023-11-18 07:38:05,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=124360.0, ans=0.0 2023-11-18 07:38:08,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124360.0, ans=0.1 2023-11-18 07:38:12,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=124426.66666666667, ans=0.125 2023-11-18 07:38:15,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.48 vs. limit=22.5 2023-11-18 07:38:21,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=124493.33333333333, ans=0.1 2023-11-18 07:38:22,528 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6650, loss[loss=0.1245, simple_loss=0.1265, pruned_loss=0.04946, audio_tagging_loss=0.01176, over 13890.00 frames. ], tot_loss[loss=0.1292, simple_loss=0.1352, pruned_loss=0.04892, audio_tagging_loss=0.01265, over 3046897.97 frames. ], batch size: 53, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:38:35,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=124560.0, ans=0.2 2023-11-18 07:38:56,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=124693.33333333333, ans=0.0 2023-11-18 07:39:07,725 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 1.041e+02 1.128e+02 1.286e+02 1.870e+02, threshold=2.255e+02, percent-clipped=0.0 2023-11-18 07:39:15,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=124760.0, ans=0.125 2023-11-18 07:39:18,527 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6700, loss[loss=0.1283, simple_loss=0.1382, pruned_loss=0.04631, audio_tagging_loss=0.01288, over 15029.00 frames. ], tot_loss[loss=0.1295, simple_loss=0.1359, pruned_loss=0.04903, audio_tagging_loss=0.01248, over 3039510.80 frames. ], batch size: 58, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:39:41,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=124960.0, ans=0.0 2023-11-18 07:39:43,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.32 vs. limit=10.0 2023-11-18 07:39:47,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=124960.0, ans=0.1 2023-11-18 07:39:51,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=124960.0, ans=0.125 2023-11-18 07:39:51,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=124960.0, ans=0.125 2023-11-18 07:40:01,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2023-11-18 07:40:09,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2023-11-18 07:40:16,329 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6750, loss[loss=0.1206, simple_loss=0.1288, pruned_loss=0.04616, audio_tagging_loss=0.01007, over 14551.00 frames. ], tot_loss[loss=0.1295, simple_loss=0.1358, pruned_loss=0.04905, audio_tagging_loss=0.01255, over 3041763.91 frames. ], batch size: 56, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:40:22,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-18 07:40:23,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=125160.0, ans=0.125 2023-11-18 07:40:37,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=125293.33333333333, ans=0.0 2023-11-18 07:41:00,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=125426.66666666667, ans=0.04949747468305833 2023-11-18 07:41:01,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.301e+01 1.017e+02 1.137e+02 1.334e+02 2.157e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 07:41:02,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=125426.66666666667, ans=0.0 2023-11-18 07:41:10,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=125426.66666666667, ans=0.125 2023-11-18 07:41:13,076 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6800, loss[loss=0.07743, simple_loss=0.0704, pruned_loss=0.02577, audio_tagging_loss=0.01646, over 14089.00 frames. ], tot_loss[loss=0.1299, simple_loss=0.1363, pruned_loss=0.04922, audio_tagging_loss=0.01248, over 3050136.21 frames. ], batch size: 57, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:41:26,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=125560.0, ans=0.125 2023-11-18 07:41:32,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=125560.0, ans=0.125 2023-11-18 07:41:39,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=125626.66666666667, ans=0.1 2023-11-18 07:41:48,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=125693.33333333333, ans=0.1 2023-11-18 07:42:09,007 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6850, loss[loss=0.1207, simple_loss=0.1172, pruned_loss=0.0515, audio_tagging_loss=0.01057, over 14214.00 frames. ], tot_loss[loss=0.1292, simple_loss=0.1357, pruned_loss=0.04887, audio_tagging_loss=0.01251, over 3048552.51 frames. ], batch size: 54, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:42:13,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=125826.66666666667, ans=0.125 2023-11-18 07:42:15,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=125826.66666666667, ans=0.1 2023-11-18 07:42:16,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=125826.66666666667, ans=0.125 2023-11-18 07:42:53,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=126093.33333333333, ans=0.125 2023-11-18 07:42:54,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 9.921e+01 1.152e+02 1.334e+02 2.003e+02, threshold=2.305e+02, percent-clipped=0.0 2023-11-18 07:43:04,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=126160.0, ans=0.1 2023-11-18 07:43:05,705 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6900, loss[loss=0.12, simple_loss=0.1184, pruned_loss=0.04797, audio_tagging_loss=0.0128, over 16594.00 frames. ], tot_loss[loss=0.1294, simple_loss=0.136, pruned_loss=0.04888, audio_tagging_loss=0.01254, over 3050916.51 frames. ], batch size: 64, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:43:08,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=126160.0, ans=0.0 2023-11-18 07:43:14,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=126160.0, ans=0.025 2023-11-18 07:43:32,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2023-11-18 07:43:44,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126360.0, ans=0.1 2023-11-18 07:43:48,927 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:43:53,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=126426.66666666667, ans=0.125 2023-11-18 07:44:01,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=126426.66666666667, ans=0.0 2023-11-18 07:44:03,097 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 6950, loss[loss=0.1274, simple_loss=0.1254, pruned_loss=0.0512, audio_tagging_loss=0.01349, over 15579.00 frames. ], tot_loss[loss=0.1294, simple_loss=0.1359, pruned_loss=0.04896, audio_tagging_loss=0.01247, over 3055298.65 frames. ], batch size: 58, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:44:04,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=126493.33333333333, ans=0.2 2023-11-18 07:44:25,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=126626.66666666667, ans=0.2 2023-11-18 07:44:48,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=126760.0, ans=0.035 2023-11-18 07:44:48,853 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 1.002e+02 1.157e+02 1.287e+02 1.874e+02, threshold=2.315e+02, percent-clipped=0.0 2023-11-18 07:44:59,693 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7000, loss[loss=0.09541, simple_loss=0.09093, pruned_loss=0.03373, audio_tagging_loss=0.01622, over 14928.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.134, pruned_loss=0.04792, audio_tagging_loss=0.01256, over 3054880.86 frames. ], batch size: 57, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:45:01,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=126826.66666666667, ans=0.125 2023-11-18 07:45:02,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=126826.66666666667, ans=0.125 2023-11-18 07:45:08,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=126826.66666666667, ans=0.0 2023-11-18 07:45:24,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=126960.0, ans=0.125 2023-11-18 07:45:34,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=127026.66666666667, ans=0.125 2023-11-18 07:45:54,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-18 07:45:56,130 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7050, loss[loss=0.1864, simple_loss=0.189, pruned_loss=0.07936, audio_tagging_loss=0.01257, over 15291.00 frames. ], tot_loss[loss=0.1268, simple_loss=0.1329, pruned_loss=0.04767, audio_tagging_loss=0.01271, over 3051191.20 frames. ], batch size: 54, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:46:00,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=127160.0, ans=0.125 2023-11-18 07:46:16,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=127226.66666666667, ans=0.2 2023-11-18 07:46:17,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=127226.66666666667, ans=0.125 2023-11-18 07:46:17,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=127226.66666666667, ans=0.0 2023-11-18 07:46:29,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=127360.0, ans=0.5 2023-11-18 07:46:41,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 1.025e+02 1.162e+02 1.246e+02 1.816e+02, threshold=2.324e+02, percent-clipped=0.0 2023-11-18 07:46:53,110 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7100, loss[loss=0.1106, simple_loss=0.111, pruned_loss=0.04131, audio_tagging_loss=0.01377, over 14619.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.135, pruned_loss=0.04849, audio_tagging_loss=0.01268, over 3046812.28 frames. ], batch size: 57, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:47:02,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=127493.33333333333, ans=0.0 2023-11-18 07:47:22,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.04 vs. limit=15.0 2023-11-18 07:47:28,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=127693.33333333333, ans=0.125 2023-11-18 07:47:49,844 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7150, loss[loss=0.1121, simple_loss=0.1098, pruned_loss=0.03814, audio_tagging_loss=0.01907, over 14583.00 frames. ], tot_loss[loss=0.1297, simple_loss=0.136, pruned_loss=0.04895, audio_tagging_loss=0.01275, over 3043261.15 frames. ], batch size: 55, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:48:35,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 1.022e+02 1.134e+02 1.283e+02 2.595e+02, threshold=2.267e+02, percent-clipped=2.0 2023-11-18 07:48:46,826 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7200, loss[loss=0.1236, simple_loss=0.1445, pruned_loss=0.03784, audio_tagging_loss=0.01352, over 14324.00 frames. ], tot_loss[loss=0.1296, simple_loss=0.136, pruned_loss=0.04882, audio_tagging_loss=0.01277, over 3045739.23 frames. ], batch size: 53, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:48:47,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-11-18 07:49:01,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=128226.66666666667, ans=10.0 2023-11-18 07:49:21,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=128360.0, ans=0.125 2023-11-18 07:49:33,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=128426.66666666667, ans=10.0 2023-11-18 07:49:39,425 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:49:44,003 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7250, loss[loss=0.147, simple_loss=0.1491, pruned_loss=0.05956, audio_tagging_loss=0.01287, over 16076.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.1348, pruned_loss=0.04838, audio_tagging_loss=0.01292, over 3042427.48 frames. ], batch size: 58, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:49:49,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=128493.33333333333, ans=0.125 2023-11-18 07:50:16,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.75 vs. limit=15.0 2023-11-18 07:50:20,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=128693.33333333333, ans=0.0 2023-11-18 07:50:29,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 1.028e+02 1.141e+02 1.257e+02 1.791e+02, threshold=2.282e+02, percent-clipped=0.0 2023-11-18 07:50:40,926 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7300, loss[loss=0.09566, simple_loss=0.09343, pruned_loss=0.0367, audio_tagging_loss=0.01225, over 14939.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1345, pruned_loss=0.04831, audio_tagging_loss=0.01279, over 3041088.27 frames. ], batch size: 56, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:50:42,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.73 vs. limit=15.0 2023-11-18 07:50:53,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=128893.33333333333, ans=0.125 2023-11-18 07:50:56,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128893.33333333333, ans=0.1 2023-11-18 07:50:58,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2023-11-18 07:51:11,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=128960.0, ans=0.5 2023-11-18 07:51:24,973 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2023-11-18 07:51:29,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=129093.33333333333, ans=0.125 2023-11-18 07:51:32,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=129093.33333333333, ans=0.0 2023-11-18 07:51:37,923 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7350, loss[loss=0.1526, simple_loss=0.1694, pruned_loss=0.0558, audio_tagging_loss=0.01209, over 15562.00 frames. ], tot_loss[loss=0.1285, simple_loss=0.1351, pruned_loss=0.04839, audio_tagging_loss=0.01255, over 3037196.10 frames. ], batch size: 57, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:52:09,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=129293.33333333333, ans=0.125 2023-11-18 07:52:15,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.38 vs. limit=6.0 2023-11-18 07:52:20,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=129360.0, ans=0.125 2023-11-18 07:52:23,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 1.003e+02 1.122e+02 1.264e+02 2.098e+02, threshold=2.243e+02, percent-clipped=0.0 2023-11-18 07:52:35,934 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7400, loss[loss=0.08748, simple_loss=0.08873, pruned_loss=0.03109, audio_tagging_loss=0.01203, over 14935.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1347, pruned_loss=0.04813, audio_tagging_loss=0.01247, over 3037884.02 frames. ], batch size: 56, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:53:11,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129693.33333333333, ans=0.1 2023-11-18 07:53:22,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-18 07:53:32,161 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7450, loss[loss=0.1207, simple_loss=0.1267, pruned_loss=0.04428, audio_tagging_loss=0.0131, over 15340.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1338, pruned_loss=0.04777, audio_tagging_loss=0.01246, over 3036556.15 frames. ], batch size: 59, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:53:49,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=129893.33333333333, ans=0.125 2023-11-18 07:54:02,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.21 vs. limit=10.0 2023-11-18 07:54:08,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=130026.66666666667, ans=0.125 2023-11-18 07:54:17,507 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 1.036e+02 1.151e+02 1.366e+02 1.976e+02, threshold=2.301e+02, percent-clipped=0.0 2023-11-18 07:54:26,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.88 vs. limit=8.0 2023-11-18 07:54:26,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=130093.33333333333, ans=0.0 2023-11-18 07:54:29,346 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7500, loss[loss=0.09397, simple_loss=0.0909, pruned_loss=0.03309, audio_tagging_loss=0.01543, over 14432.00 frames. ], tot_loss[loss=0.1286, simple_loss=0.1352, pruned_loss=0.04853, audio_tagging_loss=0.01245, over 3036703.88 frames. ], batch size: 55, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:54:30,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=130160.0, ans=0.0 2023-11-18 07:54:33,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=130160.0, ans=0.0 2023-11-18 07:55:25,989 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7550, loss[loss=0.1243, simple_loss=0.1268, pruned_loss=0.04601, audio_tagging_loss=0.01493, over 14960.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1345, pruned_loss=0.04834, audio_tagging_loss=0.01245, over 3042318.98 frames. ], batch size: 57, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:55:47,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=130626.66666666667, ans=6.0 2023-11-18 07:56:12,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.352e+01 1.032e+02 1.116e+02 1.247e+02 1.797e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 07:56:12,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=1.97 vs. limit=15.0 2023-11-18 07:56:22,898 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7600, loss[loss=0.1143, simple_loss=0.1109, pruned_loss=0.04468, audio_tagging_loss=0.01413, over 15066.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.1336, pruned_loss=0.04802, audio_tagging_loss=0.0125, over 3039187.70 frames. ], batch size: 58, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:56:24,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=130826.66666666667, ans=0.1 2023-11-18 07:56:30,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=130826.66666666667, ans=0.0 2023-11-18 07:56:39,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=130893.33333333333, ans=0.0 2023-11-18 07:56:50,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=130960.0, ans=0.0 2023-11-18 07:57:12,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=131093.33333333334, ans=0.2 2023-11-18 07:57:19,657 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7650, loss[loss=0.1208, simple_loss=0.1255, pruned_loss=0.04592, audio_tagging_loss=0.01209, over 14101.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.1337, pruned_loss=0.04787, audio_tagging_loss=0.01252, over 3030394.14 frames. ], batch size: 53, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:57:23,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=131160.0, ans=0.0 2023-11-18 07:57:26,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.80 vs. limit=15.0 2023-11-18 07:57:27,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2023-11-18 07:57:29,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=131160.0, ans=0.0 2023-11-18 07:57:44,539 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.28 vs. limit=15.0 2023-11-18 07:58:05,097 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 1.041e+02 1.157e+02 1.349e+02 1.751e+02, threshold=2.314e+02, percent-clipped=0.0 2023-11-18 07:58:14,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=15.0 2023-11-18 07:58:16,545 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7700, loss[loss=0.1277, simple_loss=0.1287, pruned_loss=0.05123, audio_tagging_loss=0.01213, over 15171.00 frames. ], tot_loss[loss=0.1268, simple_loss=0.1329, pruned_loss=0.0478, audio_tagging_loss=0.01255, over 3035680.03 frames. ], batch size: 59, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:58:24,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-18 07:58:38,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=131626.66666666666, ans=0.125 2023-11-18 07:58:41,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.51 vs. limit=15.0 2023-11-18 07:58:43,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=131626.66666666666, ans=0.0 2023-11-18 07:58:51,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=131693.33333333334, ans=0.125 2023-11-18 07:59:13,108 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7750, loss[loss=0.07934, simple_loss=0.08622, pruned_loss=0.02405, audio_tagging_loss=0.01218, over 15356.00 frames. ], tot_loss[loss=0.1266, simple_loss=0.1326, pruned_loss=0.04763, audio_tagging_loss=0.01265, over 3038161.81 frames. ], batch size: 59, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:59:37,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=131960.0, ans=0.125 2023-11-18 07:59:58,785 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 1.007e+02 1.110e+02 1.214e+02 2.200e+02, threshold=2.220e+02, percent-clipped=0.0 2023-11-18 07:59:59,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.95 vs. limit=15.0 2023-11-18 08:00:09,677 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7800, loss[loss=0.1612, simple_loss=0.1745, pruned_loss=0.06489, audio_tagging_loss=0.009061, over 15345.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1333, pruned_loss=0.04784, audio_tagging_loss=0.01263, over 3037650.31 frames. ], batch size: 56, lr: 2.62e-02, grad_scale: 64.0 2023-11-18 08:00:16,479 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2023-11-18 08:00:20,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=132226.66666666666, ans=0.125 2023-11-18 08:00:26,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132226.66666666666, ans=0.1 2023-11-18 08:01:07,209 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7850, loss[loss=0.1233, simple_loss=0.1176, pruned_loss=0.04739, audio_tagging_loss=0.01711, over 15630.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1335, pruned_loss=0.04773, audio_tagging_loss=0.01262, over 3039495.28 frames. ], batch size: 57, lr: 2.62e-02, grad_scale: 64.0 2023-11-18 08:01:26,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=132560.0, ans=0.125 2023-11-18 08:01:32,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=18.89 vs. limit=15.0 2023-11-18 08:01:34,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.27 vs. limit=22.5 2023-11-18 08:01:37,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2023-11-18 08:01:38,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=132626.66666666666, ans=0.125 2023-11-18 08:01:48,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=132693.33333333334, ans=0.0 2023-11-18 08:01:53,777 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.431e+01 1.081e+02 1.200e+02 1.326e+02 3.280e+02, threshold=2.400e+02, percent-clipped=1.0 2023-11-18 08:02:03,873 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7900, loss[loss=0.1599, simple_loss=0.1725, pruned_loss=0.0632, audio_tagging_loss=0.01039, over 14797.00 frames. ], tot_loss[loss=0.127, simple_loss=0.1334, pruned_loss=0.04758, audio_tagging_loss=0.01271, over 3044544.94 frames. ], batch size: 54, lr: 2.62e-02, grad_scale: 32.0 2023-11-18 08:02:06,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=132826.66666666666, ans=0.0 2023-11-18 08:02:17,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=132893.33333333334, ans=0.2 2023-11-18 08:02:28,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=132960.0, ans=0.025 2023-11-18 08:02:57,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=133093.33333333334, ans=0.0 2023-11-18 08:02:59,851 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 7950, loss[loss=0.1031, simple_loss=0.108, pruned_loss=0.03793, audio_tagging_loss=0.01117, over 15173.00 frames. ], tot_loss[loss=0.1282, simple_loss=0.1346, pruned_loss=0.04829, audio_tagging_loss=0.01258, over 3051064.19 frames. ], batch size: 59, lr: 2.62e-02, grad_scale: 32.0 2023-11-18 08:03:12,343 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:03:20,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=133226.66666666666, ans=0.125 2023-11-18 08:03:20,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=133226.66666666666, ans=0.125 2023-11-18 08:03:27,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=133293.33333333334, ans=0.125 2023-11-18 08:03:29,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=133293.33333333334, ans=0.0 2023-11-18 08:03:35,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.32 vs. limit=15.0 2023-11-18 08:03:48,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 1.027e+02 1.116e+02 1.320e+02 1.890e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 08:03:58,886 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8000, loss[loss=0.1484, simple_loss=0.1476, pruned_loss=0.05898, audio_tagging_loss=0.01566, over 15029.00 frames. ], tot_loss[loss=0.1269, simple_loss=0.1329, pruned_loss=0.04769, audio_tagging_loss=0.01274, over 3044672.17 frames. ], batch size: 57, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:04:00,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=133493.33333333334, ans=0.2 2023-11-18 08:04:09,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=133493.33333333334, ans=0.125 2023-11-18 08:04:24,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=133626.66666666666, ans=0.0 2023-11-18 08:04:43,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=133760.0, ans=0.05 2023-11-18 08:04:44,839 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:04:47,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=133760.0, ans=0.125 2023-11-18 08:04:56,042 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8050, loss[loss=0.1396, simple_loss=0.1404, pruned_loss=0.05774, audio_tagging_loss=0.01167, over 14539.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1343, pruned_loss=0.04849, audio_tagging_loss=0.01276, over 3042420.59 frames. ], batch size: 54, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:05:11,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=133893.33333333334, ans=0.125 2023-11-18 08:05:20,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=133960.0, ans=0.0 2023-11-18 08:05:42,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.010e+01 1.125e+02 1.300e+02 1.606e+02 2.209e+02, threshold=2.601e+02, percent-clipped=0.0 2023-11-18 08:05:42,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=134093.33333333334, ans=0.125 2023-11-18 08:05:52,001 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8100, loss[loss=0.1665, simple_loss=0.181, pruned_loss=0.06691, audio_tagging_loss=0.009054, over 16033.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1341, pruned_loss=0.04815, audio_tagging_loss=0.0127, over 3037812.33 frames. ], batch size: 57, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:05:52,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=134160.0, ans=0.015 2023-11-18 08:05:55,782 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.93 vs. limit=10.0 2023-11-18 08:06:23,380 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:06:28,704 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.943e+00 2023-11-18 08:06:30,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134360.0, ans=0.1 2023-11-18 08:06:31,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=134360.0, ans=0.1 2023-11-18 08:06:36,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=134426.66666666666, ans=0.0 2023-11-18 08:06:36,679 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2023-11-18 08:06:47,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=134493.33333333334, ans=0.0 2023-11-18 08:06:48,380 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8150, loss[loss=0.1756, simple_loss=0.192, pruned_loss=0.06957, audio_tagging_loss=0.01002, over 14235.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1347, pruned_loss=0.04814, audio_tagging_loss=0.01254, over 3042271.37 frames. ], batch size: 52, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:06:48,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-18 08:06:53,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=134493.33333333334, ans=0.0 2023-11-18 08:06:54,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=134493.33333333334, ans=0.125 2023-11-18 08:07:19,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-11-18 08:07:24,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.67 vs. limit=6.0 2023-11-18 08:07:34,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.378e+01 1.047e+02 1.173e+02 1.336e+02 3.591e+02, threshold=2.346e+02, percent-clipped=1.0 2023-11-18 08:07:40,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-11-18 08:07:44,589 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:07:45,621 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8200, loss[loss=0.1492, simple_loss=0.1493, pruned_loss=0.06142, audio_tagging_loss=0.01319, over 16164.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1341, pruned_loss=0.04769, audio_tagging_loss=0.01239, over 3051922.19 frames. ], batch size: 60, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:07:48,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-18 08:07:52,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=15.0 2023-11-18 08:08:16,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=134960.0, ans=0.125 2023-11-18 08:08:38,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=135093.33333333334, ans=0.0 2023-11-18 08:08:41,611 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8250, loss[loss=0.1198, simple_loss=0.1331, pruned_loss=0.04169, audio_tagging_loss=0.01162, over 16266.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.1358, pruned_loss=0.04856, audio_tagging_loss=0.01226, over 3052103.82 frames. ], batch size: 59, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:08:41,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=135160.0, ans=0.125 2023-11-18 08:08:45,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.35 vs. limit=5.0 2023-11-18 08:08:47,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=135160.0, ans=0.5 2023-11-18 08:08:58,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=135226.66666666666, ans=0.0 2023-11-18 08:09:27,855 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.216e+01 1.092e+02 1.238e+02 1.418e+02 2.138e+02, threshold=2.477e+02, percent-clipped=0.0 2023-11-18 08:09:31,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=135426.66666666666, ans=0.1 2023-11-18 08:09:32,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=135426.66666666666, ans=0.2 2023-11-18 08:09:38,140 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8300, loss[loss=0.1246, simple_loss=0.1259, pruned_loss=0.04953, audio_tagging_loss=0.01208, over 15852.00 frames. ], tot_loss[loss=0.1296, simple_loss=0.1368, pruned_loss=0.04889, audio_tagging_loss=0.01234, over 3054866.68 frames. ], batch size: 60, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:09:49,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.34 vs. limit=22.5 2023-11-18 08:10:01,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=135626.66666666666, ans=0.0 2023-11-18 08:10:23,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=135760.0, ans=0.1 2023-11-18 08:10:34,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=135826.66666666666, ans=0.2 2023-11-18 08:10:35,049 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8350, loss[loss=0.1341, simple_loss=0.152, pruned_loss=0.04658, audio_tagging_loss=0.01153, over 15386.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.1364, pruned_loss=0.04824, audio_tagging_loss=0.01229, over 3053609.31 frames. ], batch size: 56, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:10:40,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=135826.66666666666, ans=0.125 2023-11-18 08:10:41,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=135826.66666666666, ans=0.0 2023-11-18 08:10:58,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=135960.0, ans=0.125 2023-11-18 08:11:06,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=135960.0, ans=10.0 2023-11-18 08:11:06,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=135960.0, ans=0.125 2023-11-18 08:11:21,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 1.030e+02 1.156e+02 1.318e+02 1.873e+02, threshold=2.311e+02, percent-clipped=0.0 2023-11-18 08:11:31,728 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8400, loss[loss=0.0852, simple_loss=0.07823, pruned_loss=0.02942, audio_tagging_loss=0.01666, over 14581.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.1349, pruned_loss=0.04734, audio_tagging_loss=0.01252, over 3052154.08 frames. ], batch size: 56, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:11:49,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=136226.66666666666, ans=0.0 2023-11-18 08:11:54,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=136293.33333333334, ans=0.5 2023-11-18 08:12:08,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=136360.0, ans=0.0 2023-11-18 08:12:28,249 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8450, loss[loss=0.1089, simple_loss=0.1147, pruned_loss=0.0382, audio_tagging_loss=0.01336, over 14888.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1335, pruned_loss=0.04707, audio_tagging_loss=0.0127, over 3050111.98 frames. ], batch size: 56, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:12:43,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136560.0, ans=0.125 2023-11-18 08:12:48,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2023-11-18 08:13:08,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136693.33333333334, ans=0.1 2023-11-18 08:13:09,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=136693.33333333334, ans=12.0 2023-11-18 08:13:10,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-11-18 08:13:14,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.166e+01 1.028e+02 1.129e+02 1.256e+02 1.884e+02, threshold=2.258e+02, percent-clipped=0.0 2023-11-18 08:13:25,414 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8500, loss[loss=0.1305, simple_loss=0.1356, pruned_loss=0.04741, audio_tagging_loss=0.01535, over 14345.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1335, pruned_loss=0.04715, audio_tagging_loss=0.01279, over 3047196.03 frames. ], batch size: 56, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:14:00,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=12.0 2023-11-18 08:14:00,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=137026.66666666666, ans=0.2 2023-11-18 08:14:21,077 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8550, loss[loss=0.1365, simple_loss=0.1437, pruned_loss=0.05227, audio_tagging_loss=0.01244, over 15601.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.134, pruned_loss=0.04741, audio_tagging_loss=0.01276, over 3045545.64 frames. ], batch size: 59, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:14:26,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137160.0, ans=0.1 2023-11-18 08:14:27,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=137160.0, ans=0.125 2023-11-18 08:14:38,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=137226.66666666666, ans=0.2 2023-11-18 08:14:38,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=15.0 2023-11-18 08:14:42,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=137293.33333333334, ans=0.09899494936611666 2023-11-18 08:14:55,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=137360.0, ans=0.125 2023-11-18 08:15:02,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=137360.0, ans=0.2 2023-11-18 08:15:07,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 1.016e+02 1.105e+02 1.283e+02 1.880e+02, threshold=2.211e+02, percent-clipped=0.0 2023-11-18 08:15:13,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=137426.66666666666, ans=0.125 2023-11-18 08:15:15,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2023-11-18 08:15:18,204 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8600, loss[loss=0.1549, simple_loss=0.1575, pruned_loss=0.06048, audio_tagging_loss=0.01571, over 14442.00 frames. ], tot_loss[loss=0.1266, simple_loss=0.1332, pruned_loss=0.04727, audio_tagging_loss=0.01275, over 3037521.85 frames. ], batch size: 54, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:15:23,025 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.70 vs. limit=22.5 2023-11-18 08:15:29,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=12.0 2023-11-18 08:15:31,580 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.33 vs. limit=15.0 2023-11-18 08:15:47,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-11-18 08:15:48,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-18 08:15:49,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-11-18 08:15:58,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-18 08:16:01,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-18 08:16:14,989 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8650, loss[loss=0.1202, simple_loss=0.1295, pruned_loss=0.04379, audio_tagging_loss=0.01171, over 15980.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1331, pruned_loss=0.04719, audio_tagging_loss=0.01278, over 3041203.26 frames. ], batch size: 60, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:16:28,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=137893.33333333334, ans=10.0 2023-11-18 08:16:44,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=137960.0, ans=0.125 2023-11-18 08:16:58,428 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.205e+00 2023-11-18 08:17:00,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2023-11-18 08:17:01,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 1.026e+02 1.125e+02 1.305e+02 1.898e+02, threshold=2.250e+02, percent-clipped=0.0 2023-11-18 08:17:01,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=138093.33333333334, ans=0.0 2023-11-18 08:17:11,113 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8700, loss[loss=0.09629, simple_loss=0.1033, pruned_loss=0.03162, audio_tagging_loss=0.01302, over 14932.00 frames. ], tot_loss[loss=0.1269, simple_loss=0.1333, pruned_loss=0.04732, audio_tagging_loss=0.01288, over 3038148.04 frames. ], batch size: 57, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:17:12,894 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-11-18 08:17:21,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=138226.66666666666, ans=0.07 2023-11-18 08:17:29,925 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-18 08:17:56,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=138426.66666666666, ans=0.0 2023-11-18 08:17:57,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=138426.66666666666, ans=0.125 2023-11-18 08:17:58,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=138426.66666666666, ans=0.015 2023-11-18 08:17:58,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=138426.66666666666, ans=0.2 2023-11-18 08:18:02,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=138426.66666666666, ans=0.1 2023-11-18 08:18:07,689 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8750, loss[loss=0.1359, simple_loss=0.1546, pruned_loss=0.04803, audio_tagging_loss=0.01053, over 15284.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.1356, pruned_loss=0.0481, audio_tagging_loss=0.01276, over 3044732.83 frames. ], batch size: 57, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:18:08,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=138493.33333333334, ans=0.95 2023-11-18 08:18:24,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2023-11-18 08:18:36,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=138626.66666666666, ans=0.0 2023-11-18 08:18:38,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=138626.66666666666, ans=0.2 2023-11-18 08:18:55,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.161e+01 1.039e+02 1.204e+02 1.359e+02 1.963e+02, threshold=2.408e+02, percent-clipped=0.0 2023-11-18 08:18:58,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=138760.0, ans=0.0 2023-11-18 08:19:05,272 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8800, loss[loss=0.08543, simple_loss=0.08604, pruned_loss=0.0271, audio_tagging_loss=0.01531, over 14793.00 frames. ], tot_loss[loss=0.1297, simple_loss=0.1369, pruned_loss=0.04851, audio_tagging_loss=0.01274, over 3049828.86 frames. ], batch size: 59, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:19:08,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=138826.66666666666, ans=0.125 2023-11-18 08:19:25,431 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:19:26,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138960.0, ans=0.1 2023-11-18 08:19:53,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=139093.33333333334, ans=0.125 2023-11-18 08:19:54,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=139093.33333333334, ans=0.0 2023-11-18 08:19:55,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=15.0 2023-11-18 08:19:56,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=139093.33333333334, ans=0.0 2023-11-18 08:20:01,561 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8850, loss[loss=0.1001, simple_loss=0.1083, pruned_loss=0.03314, audio_tagging_loss=0.0128, over 16935.00 frames. ], tot_loss[loss=0.129, simple_loss=0.1361, pruned_loss=0.04818, audio_tagging_loss=0.01273, over 3057715.14 frames. ], batch size: 67, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:20:05,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2023-11-18 08:20:09,076 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:20:11,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=139226.66666666666, ans=0.125 2023-11-18 08:20:12,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=139226.66666666666, ans=0.2 2023-11-18 08:20:20,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=139226.66666666666, ans=0.125 2023-11-18 08:20:37,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=139360.0, ans=0.025 2023-11-18 08:20:39,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=139360.0, ans=0.125 2023-11-18 08:20:42,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2023-11-18 08:20:43,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=139360.0, ans=0.125 2023-11-18 08:20:47,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 1.044e+02 1.177e+02 1.340e+02 1.901e+02, threshold=2.354e+02, percent-clipped=0.0 2023-11-18 08:20:48,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=139426.66666666666, ans=0.0 2023-11-18 08:20:51,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=139426.66666666666, ans=0.125 2023-11-18 08:20:57,244 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8900, loss[loss=0.1227, simple_loss=0.1302, pruned_loss=0.04446, audio_tagging_loss=0.01313, over 15620.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.135, pruned_loss=0.0474, audio_tagging_loss=0.01272, over 3057956.72 frames. ], batch size: 57, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:21:11,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139560.0, ans=0.1 2023-11-18 08:21:19,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=139626.66666666666, ans=0.0 2023-11-18 08:21:40,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=139693.33333333334, ans=0.2 2023-11-18 08:21:41,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=139760.0, ans=0.0 2023-11-18 08:21:54,082 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 8950, loss[loss=0.1331, simple_loss=0.1361, pruned_loss=0.04928, audio_tagging_loss=0.01576, over 15572.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1357, pruned_loss=0.04767, audio_tagging_loss=0.01256, over 3063174.23 frames. ], batch size: 57, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:22:18,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=139960.0, ans=0.0 2023-11-18 08:22:19,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=139960.0, ans=0.125 2023-11-18 08:22:29,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140026.66666666666, ans=0.125 2023-11-18 08:22:29,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=140026.66666666666, ans=0.125 2023-11-18 08:22:41,390 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.130e+01 1.003e+02 1.129e+02 1.259e+02 1.857e+02, threshold=2.258e+02, percent-clipped=0.0 2023-11-18 08:22:42,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=140093.33333333334, ans=0.125 2023-11-18 08:22:43,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=140093.33333333334, ans=0.125 2023-11-18 08:22:51,052 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9000, loss[loss=0.1173, simple_loss=0.1278, pruned_loss=0.0425, audio_tagging_loss=0.01092, over 14426.00 frames. ], tot_loss[loss=0.1289, simple_loss=0.1367, pruned_loss=0.04818, audio_tagging_loss=0.01236, over 3057412.35 frames. ], batch size: 56, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:22:51,053 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 08:23:20,756 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8117, 2.8058, 4.7611, 4.4250], device='cuda:3') 2023-11-18 08:23:26,294 INFO [train_asr.py:1147] (3/4) Epoch 2, validation: loss=0.08723, simple_loss=0.06802, pruned_loss=0.01417, audio_tagging_loss=0.03906, over 4681554.00 frames. 2023-11-18 08:23:26,294 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 08:23:37,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=140226.66666666666, ans=0.05 2023-11-18 08:23:40,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=140226.66666666666, ans=0.125 2023-11-18 08:23:41,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=140226.66666666666, ans=0.2 2023-11-18 08:23:51,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=140293.33333333334, ans=0.125 2023-11-18 08:23:53,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=140293.33333333334, ans=0.04949747468305833 2023-11-18 08:24:02,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=12.0 2023-11-18 08:24:04,492 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.23 vs. limit=15.0 2023-11-18 08:24:13,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140426.66666666666, ans=0.1 2023-11-18 08:24:19,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=140426.66666666666, ans=0.0 2023-11-18 08:24:22,612 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9050, loss[loss=0.1282, simple_loss=0.1334, pruned_loss=0.04877, audio_tagging_loss=0.01274, over 14915.00 frames. ], tot_loss[loss=0.1294, simple_loss=0.1373, pruned_loss=0.04838, audio_tagging_loss=0.01232, over 3062499.59 frames. ], batch size: 57, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:24:35,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=140560.0, ans=0.0 2023-11-18 08:24:49,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.03 vs. limit=15.0 2023-11-18 08:25:08,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 1.025e+02 1.134e+02 1.283e+02 1.776e+02, threshold=2.268e+02, percent-clipped=0.0 2023-11-18 08:25:18,126 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9100, loss[loss=0.09386, simple_loss=0.09515, pruned_loss=0.03118, audio_tagging_loss=0.01511, over 14279.00 frames. ], tot_loss[loss=0.1285, simple_loss=0.1365, pruned_loss=0.04793, audio_tagging_loss=0.01233, over 3060432.76 frames. ], batch size: 58, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:25:26,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=140826.66666666666, ans=0.125 2023-11-18 08:25:26,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=140826.66666666666, ans=0.0 2023-11-18 08:25:29,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=140893.33333333334, ans=0.2 2023-11-18 08:25:51,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=141026.66666666666, ans=0.09899494936611666 2023-11-18 08:25:56,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=141026.66666666666, ans=0.95 2023-11-18 08:26:13,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=141160.0, ans=0.125 2023-11-18 08:26:13,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-11-18 08:26:15,083 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9150, loss[loss=0.108, simple_loss=0.1116, pruned_loss=0.03997, audio_tagging_loss=0.01225, over 15428.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1358, pruned_loss=0.04785, audio_tagging_loss=0.01237, over 3052989.97 frames. ], batch size: 58, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:26:16,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141160.0, ans=0.1 2023-11-18 08:26:19,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=141160.0, ans=0.125 2023-11-18 08:26:43,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=141293.33333333334, ans=0.125 2023-11-18 08:26:50,859 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.80 vs. limit=22.5 2023-11-18 08:27:01,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.257e+01 1.062e+02 1.145e+02 1.276e+02 2.030e+02, threshold=2.290e+02, percent-clipped=0.0 2023-11-18 08:27:12,363 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9200, loss[loss=0.09134, simple_loss=0.09074, pruned_loss=0.03118, audio_tagging_loss=0.01479, over 15076.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.1351, pruned_loss=0.04763, audio_tagging_loss=0.01241, over 3051831.98 frames. ], batch size: 59, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:27:21,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=141493.33333333334, ans=0.2 2023-11-18 08:27:51,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141693.33333333334, ans=0.1 2023-11-18 08:28:01,514 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:28:08,766 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9250, loss[loss=0.07609, simple_loss=0.08288, pruned_loss=0.02109, audio_tagging_loss=0.01356, over 14181.00 frames. ], tot_loss[loss=0.1269, simple_loss=0.1344, pruned_loss=0.04737, audio_tagging_loss=0.01234, over 3049631.33 frames. ], batch size: 56, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:28:11,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=141826.66666666666, ans=0.0 2023-11-18 08:28:12,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=141826.66666666666, ans=0.0 2023-11-18 08:28:13,251 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:28:24,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=141893.33333333334, ans=0.125 2023-11-18 08:28:55,016 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.910e+01 1.033e+02 1.140e+02 1.302e+02 2.365e+02, threshold=2.281e+02, percent-clipped=1.0 2023-11-18 08:29:04,822 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9300, loss[loss=0.1212, simple_loss=0.1231, pruned_loss=0.04825, audio_tagging_loss=0.0114, over 14785.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1341, pruned_loss=0.04712, audio_tagging_loss=0.01254, over 3054519.68 frames. ], batch size: 59, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:29:16,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=142226.66666666666, ans=0.125 2023-11-18 08:29:30,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=142293.33333333334, ans=0.0 2023-11-18 08:29:40,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.23 vs. limit=15.0 2023-11-18 08:30:01,722 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9350, loss[loss=0.1205, simple_loss=0.1219, pruned_loss=0.04732, audio_tagging_loss=0.01221, over 14534.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1343, pruned_loss=0.04741, audio_tagging_loss=0.0125, over 3054454.85 frames. ], batch size: 56, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:30:04,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2023-11-18 08:30:11,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=142493.33333333334, ans=0.07 2023-11-18 08:30:23,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=142626.66666666666, ans=0.125 2023-11-18 08:30:38,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=142693.33333333334, ans=0.125 2023-11-18 08:30:48,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 1.054e+02 1.142e+02 1.283e+02 1.990e+02, threshold=2.284e+02, percent-clipped=0.0 2023-11-18 08:30:54,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=142760.0, ans=0.125 2023-11-18 08:30:59,168 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9400, loss[loss=0.1465, simple_loss=0.1544, pruned_loss=0.05523, audio_tagging_loss=0.0141, over 15106.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.135, pruned_loss=0.04779, audio_tagging_loss=0.01255, over 3056068.13 frames. ], batch size: 57, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:31:06,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142826.66666666666, ans=0.1 2023-11-18 08:31:28,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=142960.0, ans=0.125 2023-11-18 08:31:39,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2023-11-18 08:31:44,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=143093.33333333334, ans=0.05 2023-11-18 08:31:50,660 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:31:54,936 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9450, loss[loss=0.1199, simple_loss=0.1244, pruned_loss=0.0441, audio_tagging_loss=0.01357, over 14295.00 frames. ], tot_loss[loss=0.1283, simple_loss=0.1356, pruned_loss=0.0478, audio_tagging_loss=0.01266, over 3064339.54 frames. ], batch size: 57, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:31:57,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=143160.0, ans=0.0 2023-11-18 08:31:58,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=143160.0, ans=0.0 2023-11-18 08:32:00,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=143160.0, ans=0.125 2023-11-18 08:32:05,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.15 vs. limit=10.0 2023-11-18 08:32:10,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=143226.66666666666, ans=0.125 2023-11-18 08:32:18,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=143293.33333333334, ans=0.0 2023-11-18 08:32:30,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.78 vs. limit=22.5 2023-11-18 08:32:32,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=22.5 2023-11-18 08:32:41,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.457e+01 1.024e+02 1.132e+02 1.318e+02 2.507e+02, threshold=2.264e+02, percent-clipped=1.0 2023-11-18 08:32:42,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143426.66666666666, ans=0.1 2023-11-18 08:32:51,325 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9500, loss[loss=0.1323, simple_loss=0.1375, pruned_loss=0.04775, audio_tagging_loss=0.01574, over 15731.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1353, pruned_loss=0.0475, audio_tagging_loss=0.01276, over 3062360.56 frames. ], batch size: 61, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:32:53,211 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:33:01,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=143493.33333333334, ans=0.1 2023-11-18 08:33:02,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=143560.0, ans=0.0 2023-11-18 08:33:05,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=143560.0, ans=0.0 2023-11-18 08:33:19,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=143626.66666666666, ans=0.125 2023-11-18 08:33:36,146 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.203e-03 2023-11-18 08:33:48,288 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9550, loss[loss=0.1087, simple_loss=0.1212, pruned_loss=0.03497, audio_tagging_loss=0.01312, over 14748.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1352, pruned_loss=0.04753, audio_tagging_loss=0.01298, over 3050800.02 frames. ], batch size: 57, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:33:55,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=143826.66666666666, ans=0.125 2023-11-18 08:34:34,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 9.724e+01 1.130e+02 1.324e+02 2.108e+02, threshold=2.261e+02, percent-clipped=0.0 2023-11-18 08:34:38,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.27 vs. limit=22.5 2023-11-18 08:34:44,699 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9600, loss[loss=0.1153, simple_loss=0.1303, pruned_loss=0.04002, audio_tagging_loss=0.01018, over 14706.00 frames. ], tot_loss[loss=0.1268, simple_loss=0.1339, pruned_loss=0.04685, audio_tagging_loss=0.01297, over 3053387.67 frames. ], batch size: 56, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:35:02,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=144226.66666666666, ans=0.95 2023-11-18 08:35:10,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-11-18 08:35:21,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.38 vs. limit=22.5 2023-11-18 08:35:38,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=144426.66666666666, ans=0.125 2023-11-18 08:35:40,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=144493.33333333334, ans=0.125 2023-11-18 08:35:41,218 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9650, loss[loss=0.1303, simple_loss=0.1385, pruned_loss=0.04722, audio_tagging_loss=0.01379, over 14470.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.134, pruned_loss=0.04733, audio_tagging_loss=0.01295, over 3051541.47 frames. ], batch size: 55, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:35:58,999 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2023-11-18 08:36:11,135 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-18 08:36:17,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=144693.33333333334, ans=0.0 2023-11-18 08:36:20,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2023-11-18 08:36:27,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-18 08:36:27,674 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.399e+01 1.034e+02 1.159e+02 1.347e+02 1.813e+02, threshold=2.318e+02, percent-clipped=0.0 2023-11-18 08:36:38,026 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9700, loss[loss=0.1205, simple_loss=0.1151, pruned_loss=0.04933, audio_tagging_loss=0.01361, over 14777.00 frames. ], tot_loss[loss=0.1259, simple_loss=0.1324, pruned_loss=0.04683, audio_tagging_loss=0.01283, over 3052472.94 frames. ], batch size: 58, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:36:43,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=144826.66666666666, ans=0.025 2023-11-18 08:36:59,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=144960.0, ans=0.0 2023-11-18 08:37:22,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=145093.33333333334, ans=0.0 2023-11-18 08:37:24,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=145093.33333333334, ans=0.125 2023-11-18 08:37:34,000 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9750, loss[loss=0.1236, simple_loss=0.1412, pruned_loss=0.04445, audio_tagging_loss=0.008534, over 15847.00 frames. ], tot_loss[loss=0.1263, simple_loss=0.1335, pruned_loss=0.04698, audio_tagging_loss=0.01254, over 3053662.52 frames. ], batch size: 60, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:37:52,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=145226.66666666666, ans=0.2 2023-11-18 08:38:01,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=145293.33333333334, ans=0.125 2023-11-18 08:38:02,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=145293.33333333334, ans=0.125 2023-11-18 08:38:08,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=145360.0, ans=0.125 2023-11-18 08:38:18,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=145426.66666666666, ans=0.125 2023-11-18 08:38:19,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=145426.66666666666, ans=0.0 2023-11-18 08:38:21,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 1.005e+02 1.144e+02 1.318e+02 1.775e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 08:38:31,542 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9800, loss[loss=0.1042, simple_loss=0.1135, pruned_loss=0.033, audio_tagging_loss=0.01444, over 16447.00 frames. ], tot_loss[loss=0.1257, simple_loss=0.1332, pruned_loss=0.04663, audio_tagging_loss=0.01245, over 3053026.63 frames. ], batch size: 65, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:38:53,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145626.66666666666, ans=0.1 2023-11-18 08:38:56,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=145626.66666666666, ans=0.125 2023-11-18 08:39:08,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=145693.33333333334, ans=0.125 2023-11-18 08:39:19,459 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:39:28,518 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9850, loss[loss=0.1101, simple_loss=0.1192, pruned_loss=0.04009, audio_tagging_loss=0.01039, over 15319.00 frames. ], tot_loss[loss=0.1257, simple_loss=0.1332, pruned_loss=0.0467, audio_tagging_loss=0.01235, over 3048354.47 frames. ], batch size: 56, lr: 2.51e-02, grad_scale: 32.0 2023-11-18 08:39:41,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=145893.33333333334, ans=0.125 2023-11-18 08:39:53,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.04 vs. limit=15.0 2023-11-18 08:39:59,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145960.0, ans=0.1 2023-11-18 08:40:03,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=146026.66666666666, ans=0.0 2023-11-18 08:40:14,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 1.021e+02 1.122e+02 1.308e+02 2.084e+02, threshold=2.244e+02, percent-clipped=0.0 2023-11-18 08:40:24,457 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9900, loss[loss=0.1339, simple_loss=0.1389, pruned_loss=0.05267, audio_tagging_loss=0.01179, over 16134.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.1334, pruned_loss=0.04683, audio_tagging_loss=0.01227, over 3050630.23 frames. ], batch size: 58, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:40:30,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=146160.0, ans=0.0 2023-11-18 08:40:44,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146226.66666666666, ans=0.1 2023-11-18 08:41:03,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=146360.0, ans=0.04949747468305833 2023-11-18 08:41:19,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=146493.33333333334, ans=0.125 2023-11-18 08:41:20,911 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 9950, loss[loss=0.1283, simple_loss=0.141, pruned_loss=0.04692, audio_tagging_loss=0.01085, over 14825.00 frames. ], tot_loss[loss=0.1245, simple_loss=0.132, pruned_loss=0.04606, audio_tagging_loss=0.0124, over 3049932.15 frames. ], batch size: 56, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:41:21,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=146493.33333333334, ans=0.125 2023-11-18 08:41:40,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=146560.0, ans=0.125 2023-11-18 08:41:52,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=146626.66666666666, ans=0.0 2023-11-18 08:41:54,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=146693.33333333334, ans=0.025 2023-11-18 08:42:07,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 1.033e+02 1.176e+02 1.296e+02 1.958e+02, threshold=2.352e+02, percent-clipped=0.0 2023-11-18 08:42:09,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.41 vs. limit=22.5 2023-11-18 08:42:13,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=146760.0, ans=0.125 2023-11-18 08:42:13,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=146760.0, ans=0.125 2023-11-18 08:42:18,282 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10000, loss[loss=0.1541, simple_loss=0.1602, pruned_loss=0.06305, audio_tagging_loss=0.01093, over 14978.00 frames. ], tot_loss[loss=0.125, simple_loss=0.1323, pruned_loss=0.04634, audio_tagging_loss=0.01245, over 3056319.66 frames. ], batch size: 55, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:42:38,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=146893.33333333334, ans=0.0 2023-11-18 08:42:47,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=146960.0, ans=0.125 2023-11-18 08:43:00,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=147026.66666666666, ans=0.125 2023-11-18 08:43:11,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=147093.33333333334, ans=10.0 2023-11-18 08:43:14,428 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10050, loss[loss=0.1171, simple_loss=0.1294, pruned_loss=0.03868, audio_tagging_loss=0.01379, over 15933.00 frames. ], tot_loss[loss=0.1243, simple_loss=0.1315, pruned_loss=0.0461, audio_tagging_loss=0.01252, over 3059492.68 frames. ], batch size: 59, lr: 2.50e-02, grad_scale: 64.0 2023-11-18 08:43:21,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=147160.0, ans=0.125 2023-11-18 08:43:24,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=147226.66666666666, ans=0.125 2023-11-18 08:43:29,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=147226.66666666666, ans=0.125 2023-11-18 08:43:40,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2023-11-18 08:43:44,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.43 vs. limit=10.0 2023-11-18 08:43:51,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=147360.0, ans=0.125 2023-11-18 08:43:51,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=147360.0, ans=0.0 2023-11-18 08:43:58,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=147426.66666666666, ans=0.2 2023-11-18 08:44:01,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.399e+01 9.828e+01 1.108e+02 1.232e+02 2.122e+02, threshold=2.217e+02, percent-clipped=0.0 2023-11-18 08:44:10,753 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10100, loss[loss=0.1312, simple_loss=0.1323, pruned_loss=0.05318, audio_tagging_loss=0.0118, over 15655.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.1319, pruned_loss=0.04616, audio_tagging_loss=0.01254, over 3060330.21 frames. ], batch size: 59, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:44:21,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=147560.0, ans=0.0 2023-11-18 08:44:35,301 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:44:38,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=147626.66666666666, ans=0.125 2023-11-18 08:44:52,696 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:45:08,354 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10150, loss[loss=0.1, simple_loss=0.1062, pruned_loss=0.03684, audio_tagging_loss=0.01006, over 15596.00 frames. ], tot_loss[loss=0.1257, simple_loss=0.133, pruned_loss=0.04669, audio_tagging_loss=0.01252, over 3066308.00 frames. ], batch size: 59, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:45:08,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147826.66666666666, ans=0.1 2023-11-18 08:45:12,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=147826.66666666666, ans=0.04949747468305833 2023-11-18 08:45:20,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=147893.33333333334, ans=0.0 2023-11-18 08:45:21,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=147893.33333333334, ans=0.0 2023-11-18 08:45:24,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=147893.33333333334, ans=0.125 2023-11-18 08:45:30,109 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:45:33,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=147960.0, ans=0.1 2023-11-18 08:45:39,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147960.0, ans=0.1 2023-11-18 08:45:55,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 1.049e+02 1.137e+02 1.279e+02 1.864e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 08:46:03,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=148160.0, ans=0.0 2023-11-18 08:46:04,222 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10200, loss[loss=0.09685, simple_loss=0.1041, pruned_loss=0.03067, audio_tagging_loss=0.01415, over 15778.00 frames. ], tot_loss[loss=0.1254, simple_loss=0.1327, pruned_loss=0.04648, audio_tagging_loss=0.01253, over 3064497.71 frames. ], batch size: 60, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:46:08,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=148160.0, ans=0.125 2023-11-18 08:46:12,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=148160.0, ans=0.125 2023-11-18 08:46:20,812 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:46:21,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.44 vs. limit=6.0 2023-11-18 08:46:29,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=148293.33333333334, ans=0.125 2023-11-18 08:46:37,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148360.0, ans=0.1 2023-11-18 08:46:38,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2023-11-18 08:46:40,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=148360.0, ans=0.2 2023-11-18 08:46:45,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=148360.0, ans=0.125 2023-11-18 08:46:53,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=148426.66666666666, ans=0.0 2023-11-18 08:46:58,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=148426.66666666666, ans=0.0 2023-11-18 08:47:00,844 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10250, loss[loss=0.1602, simple_loss=0.1748, pruned_loss=0.05932, audio_tagging_loss=0.01346, over 14959.00 frames. ], tot_loss[loss=0.1259, simple_loss=0.1334, pruned_loss=0.04669, audio_tagging_loss=0.01256, over 3054138.25 frames. ], batch size: 54, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:47:17,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=148560.0, ans=0.125 2023-11-18 08:47:40,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=148693.33333333334, ans=0.125 2023-11-18 08:47:42,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=148693.33333333334, ans=15.0 2023-11-18 08:47:48,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 1.009e+02 1.120e+02 1.271e+02 1.895e+02, threshold=2.240e+02, percent-clipped=0.0 2023-11-18 08:47:50,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.35 vs. limit=15.0 2023-11-18 08:47:58,052 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10300, loss[loss=0.1229, simple_loss=0.1345, pruned_loss=0.04109, audio_tagging_loss=0.01453, over 14728.00 frames. ], tot_loss[loss=0.1266, simple_loss=0.1338, pruned_loss=0.04699, audio_tagging_loss=0.01269, over 3049268.73 frames. ], batch size: 56, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:47:59,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=148826.66666666666, ans=0.125 2023-11-18 08:48:20,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=148960.0, ans=0.125 2023-11-18 08:48:33,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=149026.66666666666, ans=0.0 2023-11-18 08:48:37,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2023-11-18 08:48:53,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=149160.0, ans=0.125 2023-11-18 08:48:54,301 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10350, loss[loss=0.1542, simple_loss=0.1707, pruned_loss=0.0616, audio_tagging_loss=0.007308, over 15929.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.1353, pruned_loss=0.04734, audio_tagging_loss=0.01259, over 3053597.37 frames. ], batch size: 56, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:49:03,359 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-11-18 08:49:19,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=149293.33333333334, ans=0.0 2023-11-18 08:49:19,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.20 vs. limit=22.5 2023-11-18 08:49:22,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=149293.33333333334, ans=0.2 2023-11-18 08:49:26,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=149293.33333333334, ans=0.125 2023-11-18 08:49:33,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2023-11-18 08:49:41,444 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 9.879e+01 1.106e+02 1.235e+02 1.803e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 08:49:42,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=149426.66666666666, ans=0.0 2023-11-18 08:49:50,048 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10400, loss[loss=0.1012, simple_loss=0.1116, pruned_loss=0.03265, audio_tagging_loss=0.01277, over 13931.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1343, pruned_loss=0.04684, audio_tagging_loss=0.0127, over 3051164.68 frames. ], batch size: 53, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:50:00,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=149560.0, ans=0.125 2023-11-18 08:50:03,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149560.0, ans=0.1 2023-11-18 08:50:16,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=149626.66666666666, ans=0.0 2023-11-18 08:50:17,810 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=1.97 vs. limit=15.0 2023-11-18 08:50:33,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-18 08:50:35,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.95 vs. limit=10.0 2023-11-18 08:50:46,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=149826.66666666666, ans=0.0 2023-11-18 08:50:47,490 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10450, loss[loss=0.1169, simple_loss=0.1242, pruned_loss=0.04249, audio_tagging_loss=0.01234, over 15256.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.1334, pruned_loss=0.04648, audio_tagging_loss=0.01261, over 3046733.48 frames. ], batch size: 57, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:50:59,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149893.33333333334, ans=0.125 2023-11-18 08:50:59,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=149893.33333333334, ans=0.0 2023-11-18 08:51:06,289 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2023-11-18 08:51:23,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=150026.66666666666, ans=0.035 2023-11-18 08:51:31,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=150093.33333333334, ans=0.125 2023-11-18 08:51:33,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-11-18 08:51:35,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.874e+01 1.064e+02 1.233e+02 1.785e+02, threshold=2.128e+02, percent-clipped=0.0 2023-11-18 08:51:41,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=150093.33333333334, ans=0.0 2023-11-18 08:51:44,335 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10500, loss[loss=0.1314, simple_loss=0.1397, pruned_loss=0.04671, audio_tagging_loss=0.01482, over 15304.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1343, pruned_loss=0.04674, audio_tagging_loss=0.01249, over 3051225.35 frames. ], batch size: 59, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:51:45,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=150160.0, ans=0.125 2023-11-18 08:51:55,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150226.66666666666, ans=0.1 2023-11-18 08:52:18,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.94 vs. limit=10.0 2023-11-18 08:52:39,242 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:52:39,970 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10550, loss[loss=0.1175, simple_loss=0.1183, pruned_loss=0.04408, audio_tagging_loss=0.01425, over 14216.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.1337, pruned_loss=0.04642, audio_tagging_loss=0.01229, over 3058611.33 frames. ], batch size: 53, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:52:51,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=150560.0, ans=0.2 2023-11-18 08:53:27,566 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 9.723e+01 1.093e+02 1.257e+02 1.576e+02, threshold=2.186e+02, percent-clipped=0.0 2023-11-18 08:53:37,313 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10600, loss[loss=0.1085, simple_loss=0.1151, pruned_loss=0.03846, audio_tagging_loss=0.01246, over 14606.00 frames. ], tot_loss[loss=0.1259, simple_loss=0.1341, pruned_loss=0.04657, audio_tagging_loss=0.01232, over 3056906.33 frames. ], batch size: 54, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:53:49,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150893.33333333334, ans=0.1 2023-11-18 08:54:09,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-18 08:54:19,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=151026.66666666666, ans=0.1 2023-11-18 08:54:20,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=151026.66666666666, ans=0.125 2023-11-18 08:54:21,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=151093.33333333334, ans=0.0 2023-11-18 08:54:23,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=151093.33333333334, ans=0.125 2023-11-18 08:54:33,702 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10650, loss[loss=0.1187, simple_loss=0.1323, pruned_loss=0.04057, audio_tagging_loss=0.01201, over 15166.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.1344, pruned_loss=0.04668, audio_tagging_loss=0.01222, over 3053486.29 frames. ], batch size: 57, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:54:45,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=151226.66666666666, ans=0.0 2023-11-18 08:55:21,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.938e+01 1.018e+02 1.108e+02 1.279e+02 1.939e+02, threshold=2.217e+02, percent-clipped=0.0 2023-11-18 08:55:30,334 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10700, loss[loss=0.1218, simple_loss=0.1301, pruned_loss=0.04493, audio_tagging_loss=0.01183, over 15527.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.134, pruned_loss=0.04656, audio_tagging_loss=0.01225, over 3050535.70 frames. ], batch size: 59, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:55:31,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=151493.33333333334, ans=0.125 2023-11-18 08:55:31,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=151493.33333333334, ans=0.125 2023-11-18 08:56:26,872 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10750, loss[loss=0.1199, simple_loss=0.1317, pruned_loss=0.04303, audio_tagging_loss=0.01097, over 15524.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.1341, pruned_loss=0.04649, audio_tagging_loss=0.0122, over 3049141.53 frames. ], batch size: 58, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:56:51,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=151960.0, ans=0.125 2023-11-18 08:56:58,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=151960.0, ans=0.125 2023-11-18 08:57:12,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=152093.33333333334, ans=0.125 2023-11-18 08:57:14,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=152093.33333333334, ans=0.125 2023-11-18 08:57:14,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.096e+01 9.809e+01 1.098e+02 1.227e+02 2.197e+02, threshold=2.197e+02, percent-clipped=0.0 2023-11-18 08:57:18,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.82 vs. limit=10.0 2023-11-18 08:57:18,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=152093.33333333334, ans=10.0 2023-11-18 08:57:24,158 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10800, loss[loss=0.1032, simple_loss=0.09344, pruned_loss=0.03848, audio_tagging_loss=0.01802, over 15170.00 frames. ], tot_loss[loss=0.1243, simple_loss=0.1325, pruned_loss=0.0458, audio_tagging_loss=0.01228, over 3056037.91 frames. ], batch size: 60, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:57:28,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=152160.0, ans=0.0 2023-11-18 08:57:41,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.10 vs. limit=10.0 2023-11-18 08:57:56,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.02 vs. limit=10.0 2023-11-18 08:57:59,856 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.38 vs. limit=15.0 2023-11-18 08:58:15,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=152426.66666666666, ans=0.05 2023-11-18 08:58:21,121 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10850, loss[loss=0.08866, simple_loss=0.09224, pruned_loss=0.02975, audio_tagging_loss=0.01279, over 16196.00 frames. ], tot_loss[loss=0.1242, simple_loss=0.1323, pruned_loss=0.04571, audio_tagging_loss=0.01231, over 3045950.61 frames. ], batch size: 61, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 08:58:36,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=152560.0, ans=0.125 2023-11-18 08:58:44,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=152626.66666666666, ans=0.125 2023-11-18 08:58:47,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=152626.66666666666, ans=0.0 2023-11-18 08:58:55,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=152693.33333333334, ans=0.125 2023-11-18 08:58:59,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=152693.33333333334, ans=0.125 2023-11-18 08:59:00,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-11-18 08:59:08,403 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 1.080e+02 1.224e+02 1.410e+02 3.165e+02, threshold=2.449e+02, percent-clipped=2.0 2023-11-18 08:59:08,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=152760.0, ans=0.125 2023-11-18 08:59:10,617 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:59:11,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=152760.0, ans=0.125 2023-11-18 08:59:17,584 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10900, loss[loss=0.133, simple_loss=0.1534, pruned_loss=0.04614, audio_tagging_loss=0.0102, over 14667.00 frames. ], tot_loss[loss=0.124, simple_loss=0.1322, pruned_loss=0.04551, audio_tagging_loss=0.01238, over 3054172.30 frames. ], batch size: 54, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 08:59:38,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2023-11-18 08:59:43,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=152960.0, ans=0.025 2023-11-18 08:59:53,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=153026.66666666666, ans=0.125 2023-11-18 08:59:59,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=153026.66666666666, ans=0.125 2023-11-18 09:00:14,340 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 10950, loss[loss=0.1222, simple_loss=0.1299, pruned_loss=0.04473, audio_tagging_loss=0.01256, over 14838.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1321, pruned_loss=0.0454, audio_tagging_loss=0.01248, over 3055326.89 frames. ], batch size: 56, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 09:00:15,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=153160.0, ans=0.0 2023-11-18 09:00:37,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=153293.33333333334, ans=0.2 2023-11-18 09:00:38,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=153293.33333333334, ans=0.0 2023-11-18 09:00:47,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=153360.0, ans=0.0 2023-11-18 09:00:50,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2023-11-18 09:00:53,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=153360.0, ans=0.125 2023-11-18 09:00:58,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=153360.0, ans=0.1 2023-11-18 09:01:02,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.744e+01 1.111e+02 1.253e+02 1.675e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 09:01:04,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=153426.66666666666, ans=0.0 2023-11-18 09:01:10,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.56 vs. limit=10.0 2023-11-18 09:01:10,789 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11000, loss[loss=0.1242, simple_loss=0.1247, pruned_loss=0.05009, audio_tagging_loss=0.0118, over 15084.00 frames. ], tot_loss[loss=0.1226, simple_loss=0.1305, pruned_loss=0.04476, audio_tagging_loss=0.01255, over 3042500.84 frames. ], batch size: 56, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 09:01:12,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=153493.33333333334, ans=0.0 2023-11-18 09:01:15,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=153493.33333333334, ans=0.125 2023-11-18 09:01:17,741 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:01:47,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=153693.33333333334, ans=0.2 2023-11-18 09:01:52,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=153693.33333333334, ans=0.125 2023-11-18 09:02:06,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=153826.66666666666, ans=0.125 2023-11-18 09:02:06,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2023-11-18 09:02:07,782 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11050, loss[loss=0.09763, simple_loss=0.09209, pruned_loss=0.03324, audio_tagging_loss=0.01835, over 14845.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1298, pruned_loss=0.0447, audio_tagging_loss=0.0128, over 3039881.96 frames. ], batch size: 56, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:02:11,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.81 vs. limit=15.0 2023-11-18 09:02:29,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=153960.0, ans=0.125 2023-11-18 09:02:33,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=153960.0, ans=6.0 2023-11-18 09:02:46,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=154026.66666666666, ans=0.125 2023-11-18 09:02:49,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.28 vs. limit=10.0 2023-11-18 09:02:54,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=154093.33333333334, ans=0.0 2023-11-18 09:02:55,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.808e+01 1.104e+02 1.219e+02 2.392e+02, threshold=2.208e+02, percent-clipped=1.0 2023-11-18 09:02:57,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=154093.33333333334, ans=0.125 2023-11-18 09:03:04,800 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11100, loss[loss=0.1216, simple_loss=0.1276, pruned_loss=0.04522, audio_tagging_loss=0.01254, over 15034.00 frames. ], tot_loss[loss=0.1232, simple_loss=0.1303, pruned_loss=0.045, audio_tagging_loss=0.01303, over 3051177.35 frames. ], batch size: 57, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:03:07,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=154160.0, ans=0.125 2023-11-18 09:03:11,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=154160.0, ans=0.125 2023-11-18 09:03:18,235 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2023-11-18 09:03:27,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=154293.33333333334, ans=10.0 2023-11-18 09:03:30,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=154293.33333333334, ans=0.0 2023-11-18 09:03:36,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-11-18 09:03:37,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=154360.0, ans=10.0 2023-11-18 09:03:41,316 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.88 vs. limit=15.0 2023-11-18 09:03:51,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=154426.66666666666, ans=8.0 2023-11-18 09:04:00,573 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11150, loss[loss=0.08439, simple_loss=0.0855, pruned_loss=0.02566, audio_tagging_loss=0.01598, over 14866.00 frames. ], tot_loss[loss=0.1232, simple_loss=0.13, pruned_loss=0.04508, audio_tagging_loss=0.01315, over 3057958.29 frames. ], batch size: 59, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:04:07,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=154493.33333333334, ans=0.125 2023-11-18 09:04:34,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=154693.33333333334, ans=0.0 2023-11-18 09:04:39,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=154693.33333333334, ans=0.2 2023-11-18 09:04:39,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=154693.33333333334, ans=0.0 2023-11-18 09:04:48,103 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 1.033e+02 1.136e+02 1.301e+02 2.057e+02, threshold=2.273e+02, percent-clipped=0.0 2023-11-18 09:04:50,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2023-11-18 09:04:57,163 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11200, loss[loss=0.1118, simple_loss=0.1272, pruned_loss=0.03585, audio_tagging_loss=0.01239, over 15649.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.1305, pruned_loss=0.04512, audio_tagging_loss=0.01314, over 3057370.48 frames. ], batch size: 56, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:05:00,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=154826.66666666666, ans=0.125 2023-11-18 09:05:04,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=154826.66666666666, ans=0.05 2023-11-18 09:05:21,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.49 vs. limit=22.5 2023-11-18 09:05:36,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=155026.66666666666, ans=0.125 2023-11-18 09:05:42,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=155093.33333333334, ans=10.0 2023-11-18 09:05:45,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2023-11-18 09:05:53,931 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11250, loss[loss=0.1158, simple_loss=0.1249, pruned_loss=0.04475, audio_tagging_loss=0.008537, over 16505.00 frames. ], tot_loss[loss=0.1241, simple_loss=0.1311, pruned_loss=0.04552, audio_tagging_loss=0.01299, over 3058765.73 frames. ], batch size: 64, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:06:11,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=155226.66666666666, ans=0.125 2023-11-18 09:06:14,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=155293.33333333334, ans=0.125 2023-11-18 09:06:17,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=155293.33333333334, ans=0.0 2023-11-18 09:06:22,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=155293.33333333334, ans=0.125 2023-11-18 09:06:30,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=155360.0, ans=0.125 2023-11-18 09:06:32,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=155360.0, ans=0.0 2023-11-18 09:06:41,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.682e+01 1.104e+02 1.218e+02 1.906e+02, threshold=2.209e+02, percent-clipped=0.0 2023-11-18 09:06:45,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=155426.66666666666, ans=0.125 2023-11-18 09:06:46,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=155426.66666666666, ans=0.125 2023-11-18 09:06:47,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155426.66666666666, ans=0.1 2023-11-18 09:06:49,992 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11300, loss[loss=0.1877, simple_loss=0.1961, pruned_loss=0.07933, audio_tagging_loss=0.01032, over 16989.00 frames. ], tot_loss[loss=0.1253, simple_loss=0.1329, pruned_loss=0.0462, audio_tagging_loss=0.01269, over 3057116.62 frames. ], batch size: 59, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:06:51,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=155493.33333333334, ans=0.0 2023-11-18 09:06:58,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=155493.33333333334, ans=0.125 2023-11-18 09:06:58,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=155493.33333333334, ans=0.05 2023-11-18 09:07:00,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=155560.0, ans=0.1 2023-11-18 09:07:07,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=15.0 2023-11-18 09:07:16,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155626.66666666666, ans=0.1 2023-11-18 09:07:38,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=155760.0, ans=0.125 2023-11-18 09:07:42,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=155760.0, ans=0.0 2023-11-18 09:07:45,915 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11350, loss[loss=0.1415, simple_loss=0.1503, pruned_loss=0.05677, audio_tagging_loss=0.00959, over 15238.00 frames. ], tot_loss[loss=0.1263, simple_loss=0.1349, pruned_loss=0.04658, audio_tagging_loss=0.01227, over 3061342.88 frames. ], batch size: 56, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:07:51,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=155826.66666666666, ans=0.0 2023-11-18 09:08:06,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=155893.33333333334, ans=0.125 2023-11-18 09:08:14,188 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.347e+00 2023-11-18 09:08:14,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=155960.0, ans=0.125 2023-11-18 09:08:33,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 1.029e+02 1.095e+02 1.224e+02 1.585e+02, threshold=2.190e+02, percent-clipped=0.0 2023-11-18 09:08:39,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=156093.33333333334, ans=0.125 2023-11-18 09:08:43,605 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11400, loss[loss=0.1031, simple_loss=0.1112, pruned_loss=0.03402, audio_tagging_loss=0.01351, over 15179.00 frames. ], tot_loss[loss=0.1262, simple_loss=0.1351, pruned_loss=0.04644, audio_tagging_loss=0.01222, over 3059424.07 frames. ], batch size: 56, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:09:02,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=156226.66666666666, ans=0.0 2023-11-18 09:09:10,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=156293.33333333334, ans=0.0 2023-11-18 09:09:21,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=156360.0, ans=0.125 2023-11-18 09:09:39,853 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11450, loss[loss=0.1193, simple_loss=0.1226, pruned_loss=0.04512, audio_tagging_loss=0.01289, over 15198.00 frames. ], tot_loss[loss=0.1252, simple_loss=0.1338, pruned_loss=0.04608, audio_tagging_loss=0.01223, over 3053955.82 frames. ], batch size: 56, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:09:47,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=156493.33333333334, ans=0.2 2023-11-18 09:10:02,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=156626.66666666666, ans=0.0 2023-11-18 09:10:08,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=156626.66666666666, ans=0.125 2023-11-18 09:10:10,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=156626.66666666666, ans=0.125 2023-11-18 09:10:16,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=156693.33333333334, ans=0.125 2023-11-18 09:10:26,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 9.860e+01 1.075e+02 1.215e+02 1.820e+02, threshold=2.151e+02, percent-clipped=0.0 2023-11-18 09:10:35,117 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11500, loss[loss=0.1176, simple_loss=0.1268, pruned_loss=0.04073, audio_tagging_loss=0.0135, over 15104.00 frames. ], tot_loss[loss=0.1243, simple_loss=0.1327, pruned_loss=0.04579, audio_tagging_loss=0.01216, over 3049593.53 frames. ], batch size: 56, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:10:40,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=156826.66666666666, ans=0.0 2023-11-18 09:10:40,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=156826.66666666666, ans=0.0 2023-11-18 09:10:49,101 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:10:58,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156960.0, ans=0.1 2023-11-18 09:11:31,803 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11550, loss[loss=0.1534, simple_loss=0.1556, pruned_loss=0.06256, audio_tagging_loss=0.01303, over 15152.00 frames. ], tot_loss[loss=0.1238, simple_loss=0.1319, pruned_loss=0.04566, audio_tagging_loss=0.01219, over 3048477.85 frames. ], batch size: 55, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:11:32,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.29 vs. limit=22.5 2023-11-18 09:11:33,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=157160.0, ans=0.1 2023-11-18 09:11:42,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=157226.66666666666, ans=0.04949747468305833 2023-11-18 09:11:54,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=157293.33333333334, ans=0.0 2023-11-18 09:11:58,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2023-11-18 09:12:01,701 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:12:07,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=157360.0, ans=0.1 2023-11-18 09:12:12,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=157360.0, ans=0.125 2023-11-18 09:12:19,807 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.118e+01 1.012e+02 1.136e+02 1.340e+02 1.723e+02, threshold=2.272e+02, percent-clipped=0.0 2023-11-18 09:12:22,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=157426.66666666666, ans=0.125 2023-11-18 09:12:28,351 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11600, loss[loss=0.09071, simple_loss=0.09184, pruned_loss=0.02944, audio_tagging_loss=0.01535, over 14382.00 frames. ], tot_loss[loss=0.1245, simple_loss=0.1325, pruned_loss=0.04603, audio_tagging_loss=0.01221, over 3047368.90 frames. ], batch size: 56, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:12:30,178 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2023-11-18 09:12:48,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=157560.0, ans=0.0 2023-11-18 09:13:13,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=157760.0, ans=0.125 2023-11-18 09:13:23,777 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11650, loss[loss=0.08773, simple_loss=0.0888, pruned_loss=0.02921, audio_tagging_loss=0.01412, over 15885.00 frames. ], tot_loss[loss=0.1241, simple_loss=0.1325, pruned_loss=0.04565, audio_tagging_loss=0.01222, over 3048760.68 frames. ], batch size: 66, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:13:58,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=158026.66666666666, ans=0.125 2023-11-18 09:14:10,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 1.030e+02 1.124e+02 1.249e+02 1.579e+02, threshold=2.249e+02, percent-clipped=0.0 2023-11-18 09:14:19,002 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11700, loss[loss=0.1503, simple_loss=0.1652, pruned_loss=0.05744, audio_tagging_loss=0.01031, over 14788.00 frames. ], tot_loss[loss=0.1253, simple_loss=0.1337, pruned_loss=0.04628, audio_tagging_loss=0.01224, over 3041563.41 frames. ], batch size: 53, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:14:19,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=158160.0, ans=0.0 2023-11-18 09:14:22,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.35 vs. limit=15.0 2023-11-18 09:14:22,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.01 vs. limit=8.0 2023-11-18 09:14:26,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=158160.0, ans=0.125 2023-11-18 09:14:28,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=158160.0, ans=0.125 2023-11-18 09:14:38,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=158226.66666666666, ans=0.125 2023-11-18 09:14:51,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=158360.0, ans=0.125 2023-11-18 09:14:52,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=158360.0, ans=10.0 2023-11-18 09:14:59,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=158360.0, ans=0.0 2023-11-18 09:15:15,190 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11750, loss[loss=0.1586, simple_loss=0.1635, pruned_loss=0.06632, audio_tagging_loss=0.0105, over 14634.00 frames. ], tot_loss[loss=0.1251, simple_loss=0.1332, pruned_loss=0.04617, audio_tagging_loss=0.0123, over 3033018.16 frames. ], batch size: 54, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:15:25,915 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=9.430e-01 2023-11-18 09:15:36,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=158626.66666666666, ans=0.025 2023-11-18 09:15:45,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.91 vs. limit=5.0 2023-11-18 09:16:01,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.907e+01 1.124e+02 1.266e+02 1.981e+02, threshold=2.248e+02, percent-clipped=0.0 2023-11-18 09:16:10,011 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11800, loss[loss=0.08528, simple_loss=0.08198, pruned_loss=0.02405, audio_tagging_loss=0.02024, over 14547.00 frames. ], tot_loss[loss=0.1241, simple_loss=0.1318, pruned_loss=0.04567, audio_tagging_loss=0.01251, over 3039414.27 frames. ], batch size: 56, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:16:16,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=158826.66666666666, ans=0.125 2023-11-18 09:16:24,910 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2023-11-18 09:16:54,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=159093.33333333334, ans=0.125 2023-11-18 09:17:04,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-18 09:17:05,604 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11850, loss[loss=0.09937, simple_loss=0.1024, pruned_loss=0.03233, audio_tagging_loss=0.01584, over 14657.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.1321, pruned_loss=0.0459, audio_tagging_loss=0.01273, over 3042949.73 frames. ], batch size: 57, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:17:21,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2023-11-18 09:17:30,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.52 vs. limit=22.5 2023-11-18 09:17:41,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=159360.0, ans=0.125 2023-11-18 09:17:52,692 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.196e+01 1.014e+02 1.138e+02 1.282e+02 2.288e+02, threshold=2.275e+02, percent-clipped=1.0 2023-11-18 09:18:01,627 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11900, loss[loss=0.09185, simple_loss=0.1035, pruned_loss=0.02792, audio_tagging_loss=0.01219, over 13658.00 frames. ], tot_loss[loss=0.1244, simple_loss=0.1318, pruned_loss=0.04562, audio_tagging_loss=0.01284, over 3039240.74 frames. ], batch size: 54, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:18:04,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=159493.33333333334, ans=0.125 2023-11-18 09:18:06,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2023-11-18 09:18:16,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=159560.0, ans=0.125 2023-11-18 09:18:17,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=159560.0, ans=0.0 2023-11-18 09:18:18,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=159560.0, ans=0.125 2023-11-18 09:18:21,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=159560.0, ans=0.125 2023-11-18 09:18:26,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2023-11-18 09:18:32,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=159626.66666666666, ans=0.1 2023-11-18 09:18:56,790 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 11950, loss[loss=0.1042, simple_loss=0.1194, pruned_loss=0.0348, audio_tagging_loss=0.009725, over 15386.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.131, pruned_loss=0.04503, audio_tagging_loss=0.01297, over 3034892.68 frames. ], batch size: 57, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:19:03,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=159826.66666666666, ans=0.95 2023-11-18 09:19:29,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=159960.0, ans=0.125 2023-11-18 09:19:39,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-11-18 09:19:40,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=160026.66666666666, ans=0.1 2023-11-18 09:19:44,164 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.182e+01 9.874e+01 1.073e+02 1.187e+02 1.717e+02, threshold=2.145e+02, percent-clipped=0.0 2023-11-18 09:19:52,773 INFO [train_asr.py:1115] (3/4) Epoch 2, batch 12000, loss[loss=0.109, simple_loss=0.1264, pruned_loss=0.03102, audio_tagging_loss=0.01478, over 15705.00 frames. ], tot_loss[loss=0.1243, simple_loss=0.1318, pruned_loss=0.04551, audio_tagging_loss=0.01289, over 3038905.98 frames. ], batch size: 57, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:19:52,773 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 09:20:22,983 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6747, 3.7881, 4.3937, 3.5952], device='cuda:3') 2023-11-18 09:20:26,774 INFO [train_asr.py:1147] (3/4) Epoch 2, validation: loss=0.08437, simple_loss=0.06733, pruned_loss=0.01363, audio_tagging_loss=0.03708, over 4681554.00 frames. 2023-11-18 09:20:26,774 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 09:20:31,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=160160.0, ans=0.125 2023-11-18 09:21:16,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=160300.0, ans=0.0 2023-11-18 09:21:26,868 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 0, loss[loss=0.08907, simple_loss=0.07456, pruned_loss=0.02, audio_tagging_loss=0.03179, over 15046.00 frames. ], tot_loss[loss=0.08907, simple_loss=0.07456, pruned_loss=0.02, audio_tagging_loss=0.03179, over 15046.00 frames. ], batch size: 57, lr: 2.29e-02, grad_scale: 32.0 2023-11-18 09:21:26,868 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 09:21:58,052 INFO [train_asr.py:1147] (3/4) Epoch 3, validation: loss=0.08217, simple_loss=0.06725, pruned_loss=0.01375, audio_tagging_loss=0.03479, over 4681554.00 frames. 2023-11-18 09:21:58,053 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 09:22:03,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=160300.0, ans=0.125 2023-11-18 09:22:06,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.34 vs. limit=22.5 2023-11-18 09:22:07,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=160300.0, ans=0.125 2023-11-18 09:22:09,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=19.83 vs. limit=15.0 2023-11-18 09:22:29,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=160433.33333333334, ans=0.0 2023-11-18 09:22:32,031 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-11-18 09:22:32,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=12.0 2023-11-18 09:22:53,028 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 50, loss[loss=0.1461, simple_loss=0.1407, pruned_loss=0.05003, audio_tagging_loss=0.02574, over 16597.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.1271, pruned_loss=0.04375, audio_tagging_loss=0.02481, over 686015.07 frames. ], batch size: 61, lr: 2.29e-02, grad_scale: 32.0 2023-11-18 09:22:54,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=160633.33333333334, ans=0.2 2023-11-18 09:22:57,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=160633.33333333334, ans=0.2 2023-11-18 09:22:57,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=160633.33333333334, ans=0.125 2023-11-18 09:23:05,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2023-11-18 09:23:16,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.190e+01 1.036e+02 1.137e+02 1.326e+02 1.917e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 09:23:21,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=160766.66666666666, ans=0.0 2023-11-18 09:23:36,997 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.536e-01 2023-11-18 09:23:45,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=160900.0, ans=0.125 2023-11-18 09:23:47,689 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 100, loss[loss=0.1427, simple_loss=0.1559, pruned_loss=0.0485, audio_tagging_loss=0.01628, over 16183.00 frames. ], tot_loss[loss=0.1303, simple_loss=0.1278, pruned_loss=0.04289, audio_tagging_loss=0.02352, over 1207907.14 frames. ], batch size: 57, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:23:47,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=160966.66666666666, ans=0.125 2023-11-18 09:24:15,761 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:24:37,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=161233.33333333334, ans=0.125 2023-11-18 09:24:43,644 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 150, loss[loss=0.159, simple_loss=0.1777, pruned_loss=0.05772, audio_tagging_loss=0.01242, over 15225.00 frames. ], tot_loss[loss=0.1302, simple_loss=0.131, pruned_loss=0.0439, audio_tagging_loss=0.02081, over 1618049.93 frames. ], batch size: 54, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:25:06,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 1.007e+02 1.136e+02 1.298e+02 1.875e+02, threshold=2.273e+02, percent-clipped=0.0 2023-11-18 09:25:08,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=161433.33333333334, ans=0.0 2023-11-18 09:25:29,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=161566.66666666666, ans=0.0 2023-11-18 09:25:39,262 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 200, loss[loss=0.1628, simple_loss=0.17, pruned_loss=0.06772, audio_tagging_loss=0.01007, over 14576.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1309, pruned_loss=0.04429, audio_tagging_loss=0.01833, over 1931704.83 frames. ], batch size: 56, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:25:39,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=161633.33333333334, ans=0.125 2023-11-18 09:25:58,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=161700.0, ans=0.2 2023-11-18 09:26:07,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=161766.66666666666, ans=0.125 2023-11-18 09:26:34,565 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 250, loss[loss=0.1716, simple_loss=0.1789, pruned_loss=0.07141, audio_tagging_loss=0.01068, over 14482.00 frames. ], tot_loss[loss=0.1282, simple_loss=0.1334, pruned_loss=0.0451, audio_tagging_loss=0.0164, over 2181857.35 frames. ], batch size: 54, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:26:44,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=161966.66666666666, ans=0.125 2023-11-18 09:26:44,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=161966.66666666666, ans=0.0 2023-11-18 09:26:45,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=162033.33333333334, ans=0.025 2023-11-18 09:26:57,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 1.002e+02 1.144e+02 1.310e+02 1.731e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 09:27:10,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=162166.66666666666, ans=15.0 2023-11-18 09:27:26,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=162233.33333333334, ans=0.125 2023-11-18 09:27:26,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2023-11-18 09:27:30,582 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 300, loss[loss=0.1304, simple_loss=0.1395, pruned_loss=0.0485, audio_tagging_loss=0.01213, over 14922.00 frames. ], tot_loss[loss=0.1292, simple_loss=0.136, pruned_loss=0.04605, audio_tagging_loss=0.0151, over 2374483.64 frames. ], batch size: 56, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:27:36,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162300.0, ans=0.1 2023-11-18 09:27:45,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=162366.66666666666, ans=0.0 2023-11-18 09:27:57,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=162433.33333333334, ans=0.0 2023-11-18 09:27:57,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=162433.33333333334, ans=0.0 2023-11-18 09:28:22,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=162566.66666666666, ans=0.2 2023-11-18 09:28:22,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-11-18 09:28:25,387 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 350, loss[loss=0.1059, simple_loss=0.1021, pruned_loss=0.04002, audio_tagging_loss=0.0148, over 14368.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1342, pruned_loss=0.0452, audio_tagging_loss=0.01439, over 2527741.61 frames. ], batch size: 56, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:28:26,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=162633.33333333334, ans=0.125 2023-11-18 09:28:49,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=162766.66666666666, ans=0.0 2023-11-18 09:28:50,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.861e+01 1.085e+02 1.214e+02 1.858e+02, threshold=2.170e+02, percent-clipped=0.0 2023-11-18 09:28:59,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=162833.33333333334, ans=0.125 2023-11-18 09:29:08,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=162833.33333333334, ans=0.1 2023-11-18 09:29:21,420 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 400, loss[loss=0.1257, simple_loss=0.1308, pruned_loss=0.04597, audio_tagging_loss=0.01437, over 14376.00 frames. ], tot_loss[loss=0.1245, simple_loss=0.1319, pruned_loss=0.04454, audio_tagging_loss=0.014, over 2639889.55 frames. ], batch size: 54, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:29:24,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.20 vs. limit=22.5 2023-11-18 09:29:31,794 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:29:32,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=163033.33333333334, ans=0.2 2023-11-18 09:29:33,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=163033.33333333334, ans=0.1 2023-11-18 09:29:42,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=163033.33333333334, ans=0.125 2023-11-18 09:30:18,228 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 450, loss[loss=0.1009, simple_loss=0.1095, pruned_loss=0.03474, audio_tagging_loss=0.01139, over 15005.00 frames. ], tot_loss[loss=0.1232, simple_loss=0.1308, pruned_loss=0.04423, audio_tagging_loss=0.01358, over 2732944.60 frames. ], batch size: 57, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:30:40,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.316e+01 9.836e+01 1.125e+02 1.262e+02 2.640e+02, threshold=2.251e+02, percent-clipped=1.0 2023-11-18 09:31:05,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2023-11-18 09:31:06,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=163566.66666666666, ans=0.0 2023-11-18 09:31:12,858 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 500, loss[loss=0.1409, simple_loss=0.1461, pruned_loss=0.05684, audio_tagging_loss=0.01099, over 15569.00 frames. ], tot_loss[loss=0.1219, simple_loss=0.1299, pruned_loss=0.04375, audio_tagging_loss=0.01326, over 2797175.76 frames. ], batch size: 59, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:31:27,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=163700.0, ans=0.125 2023-11-18 09:31:31,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=163700.0, ans=0.125 2023-11-18 09:32:07,424 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 550, loss[loss=0.1007, simple_loss=0.1053, pruned_loss=0.03416, audio_tagging_loss=0.01389, over 14306.00 frames. ], tot_loss[loss=0.1205, simple_loss=0.1287, pruned_loss=0.04306, audio_tagging_loss=0.01307, over 2849963.12 frames. ], batch size: 54, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:32:11,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=163966.66666666666, ans=0.0 2023-11-18 09:32:31,486 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.566e+01 1.089e+02 1.252e+02 1.679e+02, threshold=2.177e+02, percent-clipped=0.0 2023-11-18 09:32:44,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=164166.66666666666, ans=0.125 2023-11-18 09:32:47,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=164166.66666666666, ans=0.125 2023-11-18 09:32:54,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=164233.33333333334, ans=0.0 2023-11-18 09:32:56,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=164233.33333333334, ans=0.125 2023-11-18 09:33:03,671 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 600, loss[loss=0.1277, simple_loss=0.1562, pruned_loss=0.04052, audio_tagging_loss=0.009128, over 14900.00 frames. ], tot_loss[loss=0.1213, simple_loss=0.1298, pruned_loss=0.04347, audio_tagging_loss=0.0129, over 2898007.47 frames. ], batch size: 54, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:33:09,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-18 09:33:09,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=12.0 2023-11-18 09:33:24,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=12.0 2023-11-18 09:33:38,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=164500.0, ans=0.125 2023-11-18 09:33:51,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164566.66666666666, ans=0.1 2023-11-18 09:33:57,686 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 650, loss[loss=0.1296, simple_loss=0.1346, pruned_loss=0.0505, audio_tagging_loss=0.0118, over 14137.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.1287, pruned_loss=0.04331, audio_tagging_loss=0.01293, over 2927037.50 frames. ], batch size: 55, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:34:02,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=164633.33333333334, ans=0.1 2023-11-18 09:34:11,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=164700.0, ans=0.0 2023-11-18 09:34:11,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=164700.0, ans=0.05 2023-11-18 09:34:20,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 9.683e+01 1.100e+02 1.220e+02 1.764e+02, threshold=2.199e+02, percent-clipped=0.0 2023-11-18 09:34:48,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2023-11-18 09:34:52,276 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 700, loss[loss=0.1245, simple_loss=0.1342, pruned_loss=0.04511, audio_tagging_loss=0.01231, over 15474.00 frames. ], tot_loss[loss=0.1211, simple_loss=0.1293, pruned_loss=0.04367, audio_tagging_loss=0.01277, over 2948232.23 frames. ], batch size: 56, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:34:59,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=164966.66666666666, ans=0.0 2023-11-18 09:35:06,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=165033.33333333334, ans=0.125 2023-11-18 09:35:16,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165100.0, ans=0.1 2023-11-18 09:35:33,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=165166.66666666666, ans=0.0 2023-11-18 09:35:37,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=165233.33333333334, ans=0.0 2023-11-18 09:35:39,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=165233.33333333334, ans=0.2 2023-11-18 09:35:49,129 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 750, loss[loss=0.1242, simple_loss=0.1281, pruned_loss=0.04617, audio_tagging_loss=0.01398, over 15044.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1307, pruned_loss=0.04436, audio_tagging_loss=0.01272, over 2974128.54 frames. ], batch size: 56, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:36:01,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=15.0 2023-11-18 09:36:11,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 1.007e+02 1.126e+02 1.277e+02 1.870e+02, threshold=2.252e+02, percent-clipped=0.0 2023-11-18 09:36:15,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=165433.33333333334, ans=0.125 2023-11-18 09:36:21,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=165500.0, ans=0.125 2023-11-18 09:36:42,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=165566.66666666666, ans=0.125 2023-11-18 09:36:44,358 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 800, loss[loss=0.1198, simple_loss=0.1376, pruned_loss=0.04245, audio_tagging_loss=0.008555, over 15680.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1312, pruned_loss=0.04423, audio_tagging_loss=0.01262, over 2998289.45 frames. ], batch size: 58, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:37:20,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=165833.33333333334, ans=0.0 2023-11-18 09:37:39,022 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 850, loss[loss=0.1048, simple_loss=0.1163, pruned_loss=0.03457, audio_tagging_loss=0.0121, over 14506.00 frames. ], tot_loss[loss=0.1222, simple_loss=0.1306, pruned_loss=0.04416, audio_tagging_loss=0.01275, over 3008320.61 frames. ], batch size: 56, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:37:54,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166033.33333333334, ans=0.1 2023-11-18 09:37:58,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=166033.33333333334, ans=0.125 2023-11-18 09:38:03,435 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 1.046e+02 1.125e+02 1.279e+02 2.412e+02, threshold=2.250e+02, percent-clipped=1.0 2023-11-18 09:38:05,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.70 vs. limit=10.0 2023-11-18 09:38:17,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=166166.66666666666, ans=15.0 2023-11-18 09:38:18,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=166166.66666666666, ans=0.0 2023-11-18 09:38:35,540 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 900, loss[loss=0.1791, simple_loss=0.2039, pruned_loss=0.06525, audio_tagging_loss=0.01187, over 15161.00 frames. ], tot_loss[loss=0.1219, simple_loss=0.1305, pruned_loss=0.04391, audio_tagging_loss=0.01278, over 3012792.22 frames. ], batch size: 55, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:38:39,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-11-18 09:38:48,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=166366.66666666666, ans=12.0 2023-11-18 09:39:01,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=166433.33333333334, ans=0.2 2023-11-18 09:39:12,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=166500.0, ans=0.07 2023-11-18 09:39:23,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=166566.66666666666, ans=0.1 2023-11-18 09:39:31,298 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 950, loss[loss=0.08333, simple_loss=0.08006, pruned_loss=0.02678, audio_tagging_loss=0.01652, over 13695.00 frames. ], tot_loss[loss=0.1222, simple_loss=0.1312, pruned_loss=0.04409, audio_tagging_loss=0.01246, over 3027287.01 frames. ], batch size: 53, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:39:54,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 9.509e+01 1.090e+02 1.237e+02 1.820e+02, threshold=2.179e+02, percent-clipped=0.0 2023-11-18 09:39:58,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=166766.66666666666, ans=0.125 2023-11-18 09:40:04,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=12.0 2023-11-18 09:40:20,152 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:40:26,282 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1000, loss[loss=0.1295, simple_loss=0.1411, pruned_loss=0.04718, audio_tagging_loss=0.01172, over 15704.00 frames. ], tot_loss[loss=0.1222, simple_loss=0.1314, pruned_loss=0.0442, audio_tagging_loss=0.01226, over 3033532.97 frames. ], batch size: 59, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:40:29,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=166966.66666666666, ans=0.0 2023-11-18 09:40:34,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-11-18 09:40:50,192 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:41:00,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=167166.66666666666, ans=0.125 2023-11-18 09:41:04,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=167166.66666666666, ans=0.09899494936611666 2023-11-18 09:41:22,338 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1050, loss[loss=0.102, simple_loss=0.1081, pruned_loss=0.03588, audio_tagging_loss=0.01208, over 16293.00 frames. ], tot_loss[loss=0.1213, simple_loss=0.1305, pruned_loss=0.0439, audio_tagging_loss=0.0122, over 3040476.91 frames. ], batch size: 63, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:41:30,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=167300.0, ans=0.0 2023-11-18 09:41:45,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2023-11-18 09:41:45,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.835e+01 9.727e+01 1.056e+02 1.215e+02 1.619e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 09:42:01,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=167500.0, ans=0.2 2023-11-18 09:42:15,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=167566.66666666666, ans=0.125 2023-11-18 09:42:17,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2023-11-18 09:42:18,375 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1100, loss[loss=0.144, simple_loss=0.1523, pruned_loss=0.05329, audio_tagging_loss=0.01451, over 14314.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1295, pruned_loss=0.04341, audio_tagging_loss=0.01213, over 3045041.93 frames. ], batch size: 56, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:42:19,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2023-11-18 09:42:21,530 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:42:51,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=167833.33333333334, ans=0.125 2023-11-18 09:43:00,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=167833.33333333334, ans=0.1 2023-11-18 09:43:13,561 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1150, loss[loss=0.1405, simple_loss=0.1482, pruned_loss=0.05693, audio_tagging_loss=0.009453, over 14647.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1297, pruned_loss=0.04347, audio_tagging_loss=0.01203, over 3044486.05 frames. ], batch size: 56, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:43:17,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2023-11-18 09:43:29,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=168033.33333333334, ans=0.125 2023-11-18 09:43:34,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=168033.33333333334, ans=0.2 2023-11-18 09:43:35,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=168100.0, ans=0.125 2023-11-18 09:43:37,785 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.862e+01 1.112e+02 1.270e+02 2.649e+02, threshold=2.225e+02, percent-clipped=1.0 2023-11-18 09:43:49,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=168166.66666666666, ans=0.125 2023-11-18 09:44:09,518 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1200, loss[loss=0.1437, simple_loss=0.1553, pruned_loss=0.05432, audio_tagging_loss=0.01169, over 15305.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1294, pruned_loss=0.04345, audio_tagging_loss=0.01208, over 3043831.70 frames. ], batch size: 56, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:44:22,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=168366.66666666666, ans=0.2 2023-11-18 09:44:31,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=168433.33333333334, ans=0.0 2023-11-18 09:44:52,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=168500.0, ans=0.0 2023-11-18 09:44:56,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=168566.66666666666, ans=0.125 2023-11-18 09:45:05,695 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1250, loss[loss=0.1407, simple_loss=0.1512, pruned_loss=0.05538, audio_tagging_loss=0.009787, over 15570.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1304, pruned_loss=0.04366, audio_tagging_loss=0.01208, over 3042209.82 frames. ], batch size: 58, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:45:27,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=168766.66666666666, ans=0.125 2023-11-18 09:45:28,399 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 1.002e+02 1.131e+02 1.253e+02 1.979e+02, threshold=2.263e+02, percent-clipped=0.0 2023-11-18 09:45:42,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=168833.33333333334, ans=0.125 2023-11-18 09:45:43,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=168833.33333333334, ans=0.125 2023-11-18 09:46:00,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-18 09:46:00,854 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1300, loss[loss=0.1172, simple_loss=0.1147, pruned_loss=0.04516, audio_tagging_loss=0.01467, over 13497.00 frames. ], tot_loss[loss=0.1215, simple_loss=0.1309, pruned_loss=0.04399, audio_tagging_loss=0.01209, over 3036762.30 frames. ], batch size: 53, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:46:05,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=168966.66666666666, ans=0.125 2023-11-18 09:46:17,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.26 vs. limit=22.5 2023-11-18 09:46:20,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=169033.33333333334, ans=0.125 2023-11-18 09:46:42,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=169166.66666666666, ans=0.2 2023-11-18 09:46:55,925 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1350, loss[loss=0.1116, simple_loss=0.1166, pruned_loss=0.0399, audio_tagging_loss=0.01344, over 15235.00 frames. ], tot_loss[loss=0.1218, simple_loss=0.1312, pruned_loss=0.04425, audio_tagging_loss=0.01199, over 3040153.35 frames. ], batch size: 58, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:46:55,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=169300.0, ans=0.015 2023-11-18 09:47:05,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=169300.0, ans=0.2 2023-11-18 09:47:07,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=169366.66666666666, ans=0.2 2023-11-18 09:47:08,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=169366.66666666666, ans=0.0 2023-11-18 09:47:18,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=169433.33333333334, ans=0.0 2023-11-18 09:47:18,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169433.33333333334, ans=0.1 2023-11-18 09:47:19,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.798e+01 9.482e+01 1.049e+02 1.147e+02 1.889e+02, threshold=2.098e+02, percent-clipped=0.0 2023-11-18 09:47:25,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=169433.33333333334, ans=0.125 2023-11-18 09:47:36,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-11-18 09:47:37,383 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:47:43,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=169566.66666666666, ans=0.0 2023-11-18 09:47:49,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=169566.66666666666, ans=0.125 2023-11-18 09:47:52,633 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1400, loss[loss=0.1116, simple_loss=0.128, pruned_loss=0.03668, audio_tagging_loss=0.01094, over 15030.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1299, pruned_loss=0.04371, audio_tagging_loss=0.01222, over 3037562.67 frames. ], batch size: 57, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:48:05,775 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.27 vs. limit=15.0 2023-11-18 09:48:15,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.98 vs. limit=10.0 2023-11-18 09:48:20,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169766.66666666666, ans=0.1 2023-11-18 09:48:20,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=169766.66666666666, ans=0.1 2023-11-18 09:48:30,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169833.33333333334, ans=0.1 2023-11-18 09:48:31,745 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:48:40,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=169900.0, ans=0.0 2023-11-18 09:48:45,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=169900.0, ans=0.125 2023-11-18 09:48:47,404 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1450, loss[loss=0.1532, simple_loss=0.1635, pruned_loss=0.06234, audio_tagging_loss=0.009083, over 15482.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1315, pruned_loss=0.04444, audio_tagging_loss=0.01226, over 3037391.33 frames. ], batch size: 56, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:48:49,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=169966.66666666666, ans=0.125 2023-11-18 09:49:11,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.160e+01 9.587e+01 1.090e+02 1.197e+02 1.611e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 09:49:26,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=8.0 2023-11-18 09:49:42,815 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1500, loss[loss=0.1306, simple_loss=0.1383, pruned_loss=0.04655, audio_tagging_loss=0.01496, over 14908.00 frames. ], tot_loss[loss=0.1223, simple_loss=0.1312, pruned_loss=0.04427, audio_tagging_loss=0.01238, over 3033906.92 frames. ], batch size: 55, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:50:04,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=170433.33333333334, ans=0.125 2023-11-18 09:50:04,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=170433.33333333334, ans=0.125 2023-11-18 09:50:04,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=170433.33333333334, ans=0.125 2023-11-18 09:50:19,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170500.0, ans=0.1 2023-11-18 09:50:23,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=170500.0, ans=0.2 2023-11-18 09:50:27,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170566.66666666666, ans=0.1 2023-11-18 09:50:39,363 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1550, loss[loss=0.1035, simple_loss=0.1189, pruned_loss=0.03481, audio_tagging_loss=0.009217, over 16252.00 frames. ], tot_loss[loss=0.1229, simple_loss=0.1322, pruned_loss=0.04439, audio_tagging_loss=0.01246, over 3031734.19 frames. ], batch size: 60, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:50:48,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=170633.33333333334, ans=0.2 2023-11-18 09:50:58,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=170700.0, ans=0.0 2023-11-18 09:50:58,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=170700.0, ans=0.0 2023-11-18 09:51:02,089 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 1.007e+02 1.092e+02 1.205e+02 1.689e+02, threshold=2.183e+02, percent-clipped=0.0 2023-11-18 09:51:07,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=12.0 2023-11-18 09:51:23,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170900.0, ans=0.1 2023-11-18 09:51:27,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-18 09:51:33,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.10 vs. limit=15.0 2023-11-18 09:51:34,224 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1600, loss[loss=0.07373, simple_loss=0.07779, pruned_loss=0.02225, audio_tagging_loss=0.01259, over 14263.00 frames. ], tot_loss[loss=0.1232, simple_loss=0.1322, pruned_loss=0.04453, audio_tagging_loss=0.01254, over 3034598.04 frames. ], batch size: 55, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:51:40,813 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.137e-02 2023-11-18 09:51:42,153 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=15.0 2023-11-18 09:51:43,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.66 vs. limit=10.0 2023-11-18 09:51:53,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=171033.33333333334, ans=0.125 2023-11-18 09:51:56,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=171100.0, ans=0.0 2023-11-18 09:52:29,520 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1650, loss[loss=0.1336, simple_loss=0.1331, pruned_loss=0.04731, audio_tagging_loss=0.01971, over 16784.00 frames. ], tot_loss[loss=0.1227, simple_loss=0.1314, pruned_loss=0.04424, audio_tagging_loss=0.01279, over 3040038.56 frames. ], batch size: 61, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:52:45,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=171366.66666666666, ans=0.2 2023-11-18 09:52:47,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.90 vs. limit=15.0 2023-11-18 09:52:50,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=171366.66666666666, ans=0.2 2023-11-18 09:52:53,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.689e+01 1.063e+02 1.242e+02 1.763e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 09:53:02,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=171500.0, ans=0.125 2023-11-18 09:53:14,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=171566.66666666666, ans=0.035 2023-11-18 09:53:26,188 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1700, loss[loss=0.1488, simple_loss=0.1555, pruned_loss=0.05724, audio_tagging_loss=0.01384, over 15710.00 frames. ], tot_loss[loss=0.1229, simple_loss=0.1318, pruned_loss=0.04421, audio_tagging_loss=0.01279, over 3039726.01 frames. ], batch size: 56, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:53:26,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=171633.33333333334, ans=0.125 2023-11-18 09:53:31,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=171633.33333333334, ans=0.2 2023-11-18 09:53:34,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=171633.33333333334, ans=0.0 2023-11-18 09:54:07,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=171833.33333333334, ans=0.2 2023-11-18 09:54:15,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2023-11-18 09:54:17,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=171900.0, ans=0.1 2023-11-18 09:54:20,939 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1750, loss[loss=0.1213, simple_loss=0.1276, pruned_loss=0.04317, audio_tagging_loss=0.01436, over 14810.00 frames. ], tot_loss[loss=0.1218, simple_loss=0.1306, pruned_loss=0.04378, audio_tagging_loss=0.01273, over 3040059.25 frames. ], batch size: 58, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:54:44,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.833e+01 1.116e+02 1.265e+02 1.757e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 09:54:59,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=172166.66666666666, ans=0.125 2023-11-18 09:55:03,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=172166.66666666666, ans=0.125 2023-11-18 09:55:16,038 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1800, loss[loss=0.1223, simple_loss=0.1327, pruned_loss=0.04442, audio_tagging_loss=0.01157, over 14404.00 frames. ], tot_loss[loss=0.1211, simple_loss=0.1302, pruned_loss=0.0435, audio_tagging_loss=0.0125, over 3042053.68 frames. ], batch size: 52, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:55:38,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=172433.33333333334, ans=0.0 2023-11-18 09:55:38,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=172433.33333333334, ans=0.125 2023-11-18 09:55:40,788 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2023-11-18 09:55:47,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=172433.33333333334, ans=0.125 2023-11-18 09:55:48,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=172500.0, ans=0.0 2023-11-18 09:56:12,362 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1850, loss[loss=0.116, simple_loss=0.1341, pruned_loss=0.0373, audio_tagging_loss=0.01167, over 15568.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1304, pruned_loss=0.0435, audio_tagging_loss=0.01248, over 3040668.98 frames. ], batch size: 56, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:56:12,659 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:56:21,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=172633.33333333334, ans=0.025 2023-11-18 09:56:28,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=172700.0, ans=0.2 2023-11-18 09:56:34,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.476e+01 9.349e+01 1.016e+02 1.150e+02 1.872e+02, threshold=2.031e+02, percent-clipped=0.0 2023-11-18 09:56:57,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=172900.0, ans=0.1 2023-11-18 09:56:57,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=172900.0, ans=0.0 2023-11-18 09:57:02,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=172900.0, ans=0.0 2023-11-18 09:57:07,223 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1900, loss[loss=0.1014, simple_loss=0.1011, pruned_loss=0.03862, audio_tagging_loss=0.01225, over 13307.00 frames. ], tot_loss[loss=0.1213, simple_loss=0.1308, pruned_loss=0.04359, audio_tagging_loss=0.01234, over 3043231.07 frames. ], batch size: 52, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:57:11,624 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:57:41,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=173166.66666666666, ans=0.025 2023-11-18 09:57:57,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=173233.33333333334, ans=0.2 2023-11-18 09:58:02,678 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 1950, loss[loss=0.11, simple_loss=0.1203, pruned_loss=0.03964, audio_tagging_loss=0.01026, over 15825.00 frames. ], tot_loss[loss=0.1216, simple_loss=0.1313, pruned_loss=0.04372, audio_tagging_loss=0.01225, over 3040068.19 frames. ], batch size: 61, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:58:04,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-18 09:58:25,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=173433.33333333334, ans=0.0 2023-11-18 09:58:26,728 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 9.559e+01 1.056e+02 1.197e+02 1.715e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 09:58:42,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.45 vs. limit=15.0 2023-11-18 09:58:45,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=173500.0, ans=10.0 2023-11-18 09:58:58,537 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2000, loss[loss=0.09865, simple_loss=0.1004, pruned_loss=0.03437, audio_tagging_loss=0.01408, over 16401.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1298, pruned_loss=0.04313, audio_tagging_loss=0.01228, over 3042577.40 frames. ], batch size: 62, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:59:31,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=173833.33333333334, ans=0.0 2023-11-18 09:59:43,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=173900.0, ans=0.2 2023-11-18 09:59:47,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=173900.0, ans=0.125 2023-11-18 09:59:53,988 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2050, loss[loss=0.1018, simple_loss=0.104, pruned_loss=0.03772, audio_tagging_loss=0.01207, over 15038.00 frames. ], tot_loss[loss=0.1217, simple_loss=0.1315, pruned_loss=0.04378, audio_tagging_loss=0.01217, over 3046228.20 frames. ], batch size: 57, lr: 2.20e-02, grad_scale: 64.0 2023-11-18 09:59:57,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=173966.66666666666, ans=0.0 2023-11-18 10:00:13,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=174033.33333333334, ans=6.0 2023-11-18 10:00:14,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=174100.0, ans=0.0 2023-11-18 10:00:16,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.215e+01 1.049e+02 1.194e+02 1.365e+02 2.043e+02, threshold=2.387e+02, percent-clipped=0.0 2023-11-18 10:00:18,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=15.0 2023-11-18 10:00:48,869 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2100, loss[loss=0.1218, simple_loss=0.1392, pruned_loss=0.0394, audio_tagging_loss=0.01283, over 16922.00 frames. ], tot_loss[loss=0.1215, simple_loss=0.1314, pruned_loss=0.0436, audio_tagging_loss=0.0122, over 3051247.91 frames. ], batch size: 63, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:01:08,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=174366.66666666666, ans=0.1 2023-11-18 10:01:17,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=174433.33333333334, ans=0.07 2023-11-18 10:01:17,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=174433.33333333334, ans=0.125 2023-11-18 10:01:18,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=174433.33333333334, ans=0.04949747468305833 2023-11-18 10:01:21,865 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:01:25,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=174500.0, ans=0.0 2023-11-18 10:01:27,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2023-11-18 10:01:31,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=174500.0, ans=0.1 2023-11-18 10:01:44,261 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2150, loss[loss=0.122, simple_loss=0.1258, pruned_loss=0.04681, audio_tagging_loss=0.01234, over 15137.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1311, pruned_loss=0.04345, audio_tagging_loss=0.01221, over 3046529.37 frames. ], batch size: 58, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:01:49,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=174633.33333333334, ans=0.125 2023-11-18 10:02:02,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=174700.0, ans=0.125 2023-11-18 10:02:08,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.849e+01 1.118e+02 1.250e+02 1.648e+02, threshold=2.236e+02, percent-clipped=0.0 2023-11-18 10:02:18,835 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:02:35,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=174900.0, ans=0.2 2023-11-18 10:02:41,108 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2200, loss[loss=0.07108, simple_loss=0.06532, pruned_loss=0.01972, audio_tagging_loss=0.0187, over 15906.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1298, pruned_loss=0.04302, audio_tagging_loss=0.01228, over 3048522.46 frames. ], batch size: 62, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:02:41,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=174966.66666666666, ans=0.125 2023-11-18 10:02:48,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=174966.66666666666, ans=0.125 2023-11-18 10:03:03,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=175100.0, ans=0.125 2023-11-18 10:03:09,232 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:03:20,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2023-11-18 10:03:34,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-11-18 10:03:36,346 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2250, loss[loss=0.08809, simple_loss=0.09832, pruned_loss=0.02671, audio_tagging_loss=0.01223, over 15175.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.129, pruned_loss=0.04269, audio_tagging_loss=0.01234, over 3047438.43 frames. ], batch size: 58, lr: 2.20e-02, grad_scale: 64.0 2023-11-18 10:04:00,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175433.33333333334, ans=0.1 2023-11-18 10:04:00,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=175433.33333333334, ans=0.125 2023-11-18 10:04:01,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.758e+01 1.067e+02 1.178e+02 1.415e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 10:04:11,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.14 vs. limit=10.0 2023-11-18 10:04:32,095 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2300, loss[loss=0.132, simple_loss=0.1515, pruned_loss=0.04797, audio_tagging_loss=0.008323, over 14975.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1288, pruned_loss=0.04264, audio_tagging_loss=0.01239, over 3042668.31 frames. ], batch size: 53, lr: 2.19e-02, grad_scale: 64.0 2023-11-18 10:04:32,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=15.0 2023-11-18 10:04:33,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=175633.33333333334, ans=0.07 2023-11-18 10:04:34,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=175633.33333333334, ans=0.0 2023-11-18 10:04:35,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=175633.33333333334, ans=0.0 2023-11-18 10:04:42,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175700.0, ans=0.1 2023-11-18 10:04:46,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=175700.0, ans=0.0 2023-11-18 10:04:54,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=175766.66666666666, ans=0.125 2023-11-18 10:04:59,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2023-11-18 10:05:07,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.07 vs. limit=6.0 2023-11-18 10:05:13,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-11-18 10:05:16,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=175900.0, ans=0.125 2023-11-18 10:05:22,649 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:05:23,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=175900.0, ans=0.2 2023-11-18 10:05:27,915 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2350, loss[loss=0.1271, simple_loss=0.1417, pruned_loss=0.04111, audio_tagging_loss=0.01514, over 16100.00 frames. ], tot_loss[loss=0.1207, simple_loss=0.1298, pruned_loss=0.04325, audio_tagging_loss=0.01254, over 3048227.10 frames. ], batch size: 59, lr: 2.19e-02, grad_scale: 64.0 2023-11-18 10:05:36,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=175966.66666666666, ans=0.0 2023-11-18 10:05:49,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=176100.0, ans=0.125 2023-11-18 10:05:51,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.821e+01 9.805e+01 1.113e+02 1.261e+02 1.707e+02, threshold=2.226e+02, percent-clipped=0.0 2023-11-18 10:06:22,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=176300.0, ans=0.0 2023-11-18 10:06:23,641 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2400, loss[loss=0.1129, simple_loss=0.1282, pruned_loss=0.03787, audio_tagging_loss=0.01095, over 14245.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.1302, pruned_loss=0.04318, audio_tagging_loss=0.01252, over 3043745.77 frames. ], batch size: 55, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:07:09,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=176566.66666666666, ans=0.125 2023-11-18 10:07:19,257 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2450, loss[loss=0.1629, simple_loss=0.1742, pruned_loss=0.06511, audio_tagging_loss=0.01068, over 15078.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1304, pruned_loss=0.04345, audio_tagging_loss=0.0125, over 3045279.33 frames. ], batch size: 56, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:07:30,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176700.0, ans=0.1 2023-11-18 10:07:45,100 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 1.013e+02 1.126e+02 1.298e+02 2.274e+02, threshold=2.253e+02, percent-clipped=1.0 2023-11-18 10:07:46,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-11-18 10:08:15,426 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2500, loss[loss=0.1179, simple_loss=0.1277, pruned_loss=0.04031, audio_tagging_loss=0.01376, over 16388.00 frames. ], tot_loss[loss=0.1213, simple_loss=0.1309, pruned_loss=0.04338, audio_tagging_loss=0.01247, over 3060520.53 frames. ], batch size: 61, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:08:16,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=176966.66666666666, ans=0.0 2023-11-18 10:08:27,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=177033.33333333334, ans=0.0 2023-11-18 10:08:37,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-11-18 10:09:05,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=177233.33333333334, ans=0.95 2023-11-18 10:09:10,976 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2550, loss[loss=0.08419, simple_loss=0.09377, pruned_loss=0.02719, audio_tagging_loss=0.01012, over 14392.00 frames. ], tot_loss[loss=0.1204, simple_loss=0.1302, pruned_loss=0.04297, audio_tagging_loss=0.01233, over 3047816.78 frames. ], batch size: 55, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:09:17,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=177300.0, ans=0.0 2023-11-18 10:09:27,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=177366.66666666666, ans=0.2 2023-11-18 10:09:36,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.76 vs. limit=15.0 2023-11-18 10:09:37,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.703e+01 1.094e+02 1.267e+02 1.679e+02, threshold=2.187e+02, percent-clipped=0.0 2023-11-18 10:09:48,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177500.0, ans=0.1 2023-11-18 10:09:52,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-11-18 10:10:06,250 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2600, loss[loss=0.09194, simple_loss=0.1105, pruned_loss=0.02485, audio_tagging_loss=0.01185, over 15221.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1294, pruned_loss=0.04278, audio_tagging_loss=0.01219, over 3050562.20 frames. ], batch size: 56, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:10:13,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2023-11-18 10:10:13,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-18 10:10:21,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-11-18 10:10:40,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=177833.33333333334, ans=0.0 2023-11-18 10:11:03,108 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2650, loss[loss=0.1036, simple_loss=0.1119, pruned_loss=0.03326, audio_tagging_loss=0.01439, over 14128.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1278, pruned_loss=0.0425, audio_tagging_loss=0.01212, over 3048016.91 frames. ], batch size: 55, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:11:09,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-11-18 10:11:15,030 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-18 10:11:27,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.789e+01 1.066e+02 1.192e+02 1.496e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 10:11:29,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=178100.0, ans=0.0 2023-11-18 10:11:30,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2023-11-18 10:11:45,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=178166.66666666666, ans=0.125 2023-11-18 10:11:49,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2023-11-18 10:11:57,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=178300.0, ans=0.125 2023-11-18 10:11:57,883 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2700, loss[loss=0.1362, simple_loss=0.1562, pruned_loss=0.04756, audio_tagging_loss=0.01056, over 16619.00 frames. ], tot_loss[loss=0.1188, simple_loss=0.1285, pruned_loss=0.04246, audio_tagging_loss=0.01205, over 3055155.93 frames. ], batch size: 61, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:12:03,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.34 vs. limit=10.0 2023-11-18 10:12:11,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=178366.66666666666, ans=0.0 2023-11-18 10:12:20,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=178433.33333333334, ans=0.2 2023-11-18 10:12:39,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=178500.0, ans=0.0 2023-11-18 10:12:53,261 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2750, loss[loss=0.09086, simple_loss=0.08987, pruned_loss=0.03177, audio_tagging_loss=0.01416, over 15053.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1276, pruned_loss=0.04201, audio_tagging_loss=0.01205, over 3053425.01 frames. ], batch size: 60, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:13:02,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2023-11-18 10:13:05,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=178700.0, ans=0.0 2023-11-18 10:13:06,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=178700.0, ans=0.0 2023-11-18 10:13:19,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 1.008e+02 1.122e+02 1.241e+02 2.001e+02, threshold=2.244e+02, percent-clipped=0.0 2023-11-18 10:13:29,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=14.87 vs. limit=15.0 2023-11-18 10:13:34,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=178833.33333333334, ans=0.125 2023-11-18 10:13:38,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=178900.0, ans=0.125 2023-11-18 10:13:41,738 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:13:50,108 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2800, loss[loss=0.1051, simple_loss=0.1249, pruned_loss=0.03184, audio_tagging_loss=0.01075, over 15049.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1254, pruned_loss=0.04122, audio_tagging_loss=0.01207, over 3050300.86 frames. ], batch size: 59, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:14:37,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179233.33333333334, ans=0.0 2023-11-18 10:14:42,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=179233.33333333334, ans=0.125 2023-11-18 10:14:44,619 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2850, loss[loss=0.1213, simple_loss=0.1289, pruned_loss=0.04473, audio_tagging_loss=0.0121, over 15132.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.1272, pruned_loss=0.04202, audio_tagging_loss=0.01204, over 3048481.21 frames. ], batch size: 56, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:14:44,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=179300.0, ans=0.0 2023-11-18 10:14:51,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=179300.0, ans=0.125 2023-11-18 10:15:02,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=179366.66666666666, ans=0.2 2023-11-18 10:15:10,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 9.812e+01 1.069e+02 1.186e+02 1.678e+02, threshold=2.137e+02, percent-clipped=0.0 2023-11-18 10:15:28,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=179566.66666666666, ans=0.125 2023-11-18 10:15:34,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179566.66666666666, ans=0.0 2023-11-18 10:15:39,681 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2900, loss[loss=0.147, simple_loss=0.1577, pruned_loss=0.05659, audio_tagging_loss=0.01157, over 15080.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1274, pruned_loss=0.04217, audio_tagging_loss=0.01208, over 3040356.33 frames. ], batch size: 54, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:15:46,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=179633.33333333334, ans=0.1 2023-11-18 10:15:47,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=179633.33333333334, ans=0.0 2023-11-18 10:15:54,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=179700.0, ans=0.125 2023-11-18 10:15:58,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=179700.0, ans=0.035 2023-11-18 10:15:59,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=179700.0, ans=0.0 2023-11-18 10:16:36,725 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 2950, loss[loss=0.1117, simple_loss=0.1143, pruned_loss=0.04158, audio_tagging_loss=0.01296, over 15621.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.127, pruned_loss=0.04204, audio_tagging_loss=0.0122, over 3051413.56 frames. ], batch size: 58, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:17:01,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.762e+01 1.073e+02 1.254e+02 1.837e+02, threshold=2.146e+02, percent-clipped=0.0 2023-11-18 10:17:07,089 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2023-11-18 10:17:24,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=180233.33333333334, ans=0.125 2023-11-18 10:17:32,030 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3000, loss[loss=0.09397, simple_loss=0.1, pruned_loss=0.02974, audio_tagging_loss=0.01424, over 15093.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1274, pruned_loss=0.04208, audio_tagging_loss=0.01231, over 3050422.19 frames. ], batch size: 56, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:17:32,031 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 10:18:05,507 INFO [train_asr.py:1147] (3/4) Epoch 3, validation: loss=0.08163, simple_loss=0.06585, pruned_loss=0.01265, audio_tagging_loss=0.03605, over 4681554.00 frames. 2023-11-18 10:18:05,508 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 10:18:07,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=180300.0, ans=0.0 2023-11-18 10:18:10,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=180300.0, ans=0.125 2023-11-18 10:18:25,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=180366.66666666666, ans=0.125 2023-11-18 10:18:31,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=180433.33333333334, ans=0.0 2023-11-18 10:19:01,886 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3050, loss[loss=0.1175, simple_loss=0.1299, pruned_loss=0.03971, audio_tagging_loss=0.01287, over 16623.00 frames. ], tot_loss[loss=0.1188, simple_loss=0.1283, pruned_loss=0.04244, audio_tagging_loss=0.01225, over 3053122.32 frames. ], batch size: 60, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:19:10,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=180633.33333333334, ans=0.125 2023-11-18 10:19:26,064 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.626e+01 1.059e+02 1.215e+02 1.726e+02, threshold=2.118e+02, percent-clipped=0.0 2023-11-18 10:19:33,994 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:19:56,865 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3100, loss[loss=0.1091, simple_loss=0.1271, pruned_loss=0.03164, audio_tagging_loss=0.01395, over 15471.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1292, pruned_loss=0.0427, audio_tagging_loss=0.01229, over 3053132.14 frames. ], batch size: 56, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:19:58,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=180966.66666666666, ans=0.04949747468305833 2023-11-18 10:20:21,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=181100.0, ans=0.0 2023-11-18 10:20:34,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=181166.66666666666, ans=0.125 2023-11-18 10:20:37,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=181166.66666666666, ans=0.125 2023-11-18 10:20:51,942 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3150, loss[loss=0.1383, simple_loss=0.1552, pruned_loss=0.05272, audio_tagging_loss=0.007996, over 15297.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.1292, pruned_loss=0.04236, audio_tagging_loss=0.0123, over 3051692.21 frames. ], batch size: 56, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:21:07,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=181366.66666666666, ans=0.2 2023-11-18 10:21:08,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=181366.66666666666, ans=0.2 2023-11-18 10:21:09,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=181366.66666666666, ans=0.0 2023-11-18 10:21:18,114 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 9.872e+01 1.154e+02 1.398e+02 2.452e+02, threshold=2.308e+02, percent-clipped=3.0 2023-11-18 10:21:30,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=181500.0, ans=0.125 2023-11-18 10:21:33,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2023-11-18 10:21:35,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=181566.66666666666, ans=0.125 2023-11-18 10:21:37,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=181566.66666666666, ans=0.2 2023-11-18 10:21:39,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=181566.66666666666, ans=0.125 2023-11-18 10:21:41,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=181566.66666666666, ans=0.125 2023-11-18 10:21:44,103 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:21:46,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-11-18 10:21:48,260 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3200, loss[loss=0.1426, simple_loss=0.1589, pruned_loss=0.05127, audio_tagging_loss=0.01189, over 15405.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1295, pruned_loss=0.04255, audio_tagging_loss=0.0124, over 3054344.66 frames. ], batch size: 56, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:21:56,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=181633.33333333334, ans=0.09899494936611666 2023-11-18 10:22:07,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181700.0, ans=0.1 2023-11-18 10:22:09,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=181766.66666666666, ans=0.2 2023-11-18 10:22:23,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=181833.33333333334, ans=0.0 2023-11-18 10:22:31,576 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.488e-01 2023-11-18 10:22:42,989 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3250, loss[loss=0.1456, simple_loss=0.151, pruned_loss=0.05814, audio_tagging_loss=0.01195, over 14356.00 frames. ], tot_loss[loss=0.1188, simple_loss=0.128, pruned_loss=0.04211, audio_tagging_loss=0.01265, over 3048157.63 frames. ], batch size: 56, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:22:44,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=181966.66666666666, ans=0.2 2023-11-18 10:22:59,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2023-11-18 10:23:02,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182033.33333333334, ans=0.1 2023-11-18 10:23:07,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 9.562e+01 1.039e+02 1.190e+02 1.635e+02, threshold=2.078e+02, percent-clipped=0.0 2023-11-18 10:23:10,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=182100.0, ans=0.0 2023-11-18 10:23:14,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=182100.0, ans=0.0 2023-11-18 10:23:17,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=182166.66666666666, ans=0.2 2023-11-18 10:23:37,519 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3300, loss[loss=0.1036, simple_loss=0.1048, pruned_loss=0.03441, audio_tagging_loss=0.01681, over 15609.00 frames. ], tot_loss[loss=0.118, simple_loss=0.127, pruned_loss=0.04178, audio_tagging_loss=0.01275, over 3039805.18 frames. ], batch size: 59, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:23:38,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=182300.0, ans=0.2 2023-11-18 10:23:57,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=182366.66666666666, ans=0.04949747468305833 2023-11-18 10:23:57,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=182366.66666666666, ans=0.125 2023-11-18 10:23:58,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=182366.66666666666, ans=0.125 2023-11-18 10:24:04,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=12.0 2023-11-18 10:24:07,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=182433.33333333334, ans=0.125 2023-11-18 10:24:10,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=182500.0, ans=0.0 2023-11-18 10:24:13,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=182500.0, ans=0.2 2023-11-18 10:24:27,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182566.66666666666, ans=0.1 2023-11-18 10:24:33,771 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3350, loss[loss=0.1188, simple_loss=0.1333, pruned_loss=0.04183, audio_tagging_loss=0.01029, over 15642.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1277, pruned_loss=0.0421, audio_tagging_loss=0.01254, over 3048246.97 frames. ], batch size: 58, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:24:49,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182700.0, ans=0.1 2023-11-18 10:24:50,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-11-18 10:24:58,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 9.499e+01 1.070e+02 1.220e+02 2.186e+02, threshold=2.139e+02, percent-clipped=1.0 2023-11-18 10:25:17,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=182900.0, ans=0.1 2023-11-18 10:25:29,561 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3400, loss[loss=0.117, simple_loss=0.1269, pruned_loss=0.04086, audio_tagging_loss=0.01265, over 13932.00 frames. ], tot_loss[loss=0.1182, simple_loss=0.1276, pruned_loss=0.04198, audio_tagging_loss=0.01239, over 3044267.68 frames. ], batch size: 52, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:25:39,506 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.22 vs. limit=6.0 2023-11-18 10:25:43,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=183033.33333333334, ans=0.2 2023-11-18 10:25:46,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=183033.33333333334, ans=0.125 2023-11-18 10:25:48,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=183033.33333333334, ans=0.0 2023-11-18 10:25:51,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.88 vs. limit=15.0 2023-11-18 10:26:07,667 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:26:08,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2023-11-18 10:26:09,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=183166.66666666666, ans=0.0 2023-11-18 10:26:13,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.01 vs. limit=10.0 2023-11-18 10:26:17,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=183233.33333333334, ans=0.125 2023-11-18 10:26:24,425 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3450, loss[loss=0.08419, simple_loss=0.09094, pruned_loss=0.02448, audio_tagging_loss=0.01424, over 14696.00 frames. ], tot_loss[loss=0.1172, simple_loss=0.1269, pruned_loss=0.04143, audio_tagging_loss=0.01233, over 3048301.98 frames. ], batch size: 56, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:26:33,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=183300.0, ans=0.0 2023-11-18 10:26:45,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=183433.33333333334, ans=0.95 2023-11-18 10:26:50,541 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.538e+01 1.062e+02 1.197e+02 2.158e+02, threshold=2.124e+02, percent-clipped=1.0 2023-11-18 10:26:53,076 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.06 vs. limit=15.0 2023-11-18 10:26:53,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=183433.33333333334, ans=0.0 2023-11-18 10:27:07,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=183566.66666666666, ans=0.125 2023-11-18 10:27:20,069 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3500, loss[loss=0.1269, simple_loss=0.1382, pruned_loss=0.04214, audio_tagging_loss=0.01567, over 15615.00 frames. ], tot_loss[loss=0.1182, simple_loss=0.1281, pruned_loss=0.04193, audio_tagging_loss=0.0122, over 3048253.84 frames. ], batch size: 59, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:27:40,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=183700.0, ans=0.0 2023-11-18 10:27:40,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.30 vs. limit=10.0 2023-11-18 10:27:41,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=183766.66666666666, ans=0.035 2023-11-18 10:27:48,625 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:28:01,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=183833.33333333334, ans=0.5 2023-11-18 10:28:02,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.11 vs. limit=15.0 2023-11-18 10:28:11,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=183900.0, ans=0.125 2023-11-18 10:28:15,712 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3550, loss[loss=0.1071, simple_loss=0.1122, pruned_loss=0.03716, audio_tagging_loss=0.01386, over 14047.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1267, pruned_loss=0.04171, audio_tagging_loss=0.01229, over 3044508.40 frames. ], batch size: 53, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:28:21,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=183966.66666666666, ans=0.125 2023-11-18 10:28:22,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=183966.66666666666, ans=0.0 2023-11-18 10:28:33,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=184033.33333333334, ans=0.125 2023-11-18 10:28:41,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.814e+01 9.739e+01 1.081e+02 1.236e+02 3.784e+02, threshold=2.163e+02, percent-clipped=1.0 2023-11-18 10:28:49,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=184166.66666666666, ans=0.125 2023-11-18 10:29:05,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=184233.33333333334, ans=0.07 2023-11-18 10:29:07,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=184233.33333333334, ans=0.0 2023-11-18 10:29:11,464 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3600, loss[loss=0.1031, simple_loss=0.1057, pruned_loss=0.03969, audio_tagging_loss=0.01053, over 14173.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1257, pruned_loss=0.04132, audio_tagging_loss=0.01234, over 3047476.89 frames. ], batch size: 53, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:29:12,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=184300.0, ans=0.2 2023-11-18 10:29:14,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=184300.0, ans=0.125 2023-11-18 10:29:20,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=184300.0, ans=0.125 2023-11-18 10:29:34,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.89 vs. limit=6.0 2023-11-18 10:29:46,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=184500.0, ans=0.125 2023-11-18 10:30:02,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2023-11-18 10:30:05,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=184566.66666666666, ans=0.0 2023-11-18 10:30:06,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=184633.33333333334, ans=0.2 2023-11-18 10:30:06,995 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3650, loss[loss=0.1563, simple_loss=0.1813, pruned_loss=0.05478, audio_tagging_loss=0.01084, over 15877.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1274, pruned_loss=0.04202, audio_tagging_loss=0.01216, over 3044558.06 frames. ], batch size: 57, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:30:11,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=184633.33333333334, ans=0.0 2023-11-18 10:30:13,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=184633.33333333334, ans=0.0 2023-11-18 10:30:19,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=184700.0, ans=0.015 2023-11-18 10:30:21,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=184700.0, ans=0.0 2023-11-18 10:30:29,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=184766.66666666666, ans=0.1 2023-11-18 10:30:29,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=184766.66666666666, ans=0.2 2023-11-18 10:30:32,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.231e+01 1.003e+02 1.085e+02 1.205e+02 1.999e+02, threshold=2.169e+02, percent-clipped=0.0 2023-11-18 10:30:44,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=12.0 2023-11-18 10:30:45,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2023-11-18 10:30:48,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=184833.33333333334, ans=0.125 2023-11-18 10:30:51,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=184900.0, ans=0.125 2023-11-18 10:31:02,921 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3700, loss[loss=0.1316, simple_loss=0.1242, pruned_loss=0.05341, audio_tagging_loss=0.01611, over 13734.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1266, pruned_loss=0.04177, audio_tagging_loss=0.01237, over 3043884.06 frames. ], batch size: 53, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:31:04,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=184966.66666666666, ans=0.125 2023-11-18 10:31:14,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=185033.33333333334, ans=0.0 2023-11-18 10:31:20,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=185033.33333333334, ans=0.07 2023-11-18 10:31:58,473 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3750, loss[loss=0.1625, simple_loss=0.1685, pruned_loss=0.06721, audio_tagging_loss=0.01101, over 14997.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1278, pruned_loss=0.04216, audio_tagging_loss=0.01238, over 3045488.03 frames. ], batch size: 56, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:32:00,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=185300.0, ans=0.125 2023-11-18 10:32:04,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=12.0 2023-11-18 10:32:08,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=185300.0, ans=0.125 2023-11-18 10:32:20,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=185433.33333333334, ans=0.125 2023-11-18 10:32:24,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.955e+01 9.670e+01 1.084e+02 1.200e+02 2.427e+02, threshold=2.168e+02, percent-clipped=1.0 2023-11-18 10:32:37,384 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:32:52,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=185566.66666666666, ans=0.95 2023-11-18 10:32:54,192 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3800, loss[loss=0.1096, simple_loss=0.1205, pruned_loss=0.03675, audio_tagging_loss=0.01261, over 15566.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1281, pruned_loss=0.04208, audio_tagging_loss=0.01233, over 3046196.57 frames. ], batch size: 57, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:33:10,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=185700.0, ans=0.125 2023-11-18 10:33:12,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=185700.0, ans=10.0 2023-11-18 10:33:14,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=185700.0, ans=0.0 2023-11-18 10:33:15,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=185766.66666666666, ans=0.125 2023-11-18 10:33:17,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=185766.66666666666, ans=0.125 2023-11-18 10:33:22,572 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.555e+00 2023-11-18 10:33:27,078 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=12.0 2023-11-18 10:33:39,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=185900.0, ans=0.125 2023-11-18 10:33:40,805 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-18 10:33:43,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=185900.0, ans=0.125 2023-11-18 10:33:50,167 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3850, loss[loss=0.1085, simple_loss=0.1203, pruned_loss=0.03676, audio_tagging_loss=0.01161, over 15654.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.1269, pruned_loss=0.0418, audio_tagging_loss=0.01239, over 3047539.47 frames. ], batch size: 59, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:33:54,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=185966.66666666666, ans=10.0 2023-11-18 10:34:10,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=186033.33333333334, ans=0.125 2023-11-18 10:34:15,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 1.009e+02 1.117e+02 1.269e+02 1.869e+02, threshold=2.233e+02, percent-clipped=0.0 2023-11-18 10:34:41,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=186233.33333333334, ans=0.0 2023-11-18 10:34:45,159 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3900, loss[loss=0.09516, simple_loss=0.1049, pruned_loss=0.02981, audio_tagging_loss=0.01289, over 14752.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1267, pruned_loss=0.04154, audio_tagging_loss=0.01239, over 3039946.80 frames. ], batch size: 57, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:35:12,818 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:35:19,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=186500.0, ans=0.125 2023-11-18 10:35:40,984 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 3950, loss[loss=0.1019, simple_loss=0.1087, pruned_loss=0.03311, audio_tagging_loss=0.01446, over 15448.00 frames. ], tot_loss[loss=0.118, simple_loss=0.1272, pruned_loss=0.04182, audio_tagging_loss=0.01256, over 3037147.64 frames. ], batch size: 61, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:35:41,465 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2023-11-18 10:35:51,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=186633.33333333334, ans=0.0 2023-11-18 10:36:06,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=186766.66666666666, ans=0.125 2023-11-18 10:36:08,758 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 9.703e+01 1.079e+02 1.244e+02 1.846e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 10:36:11,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=22.5 2023-11-18 10:36:20,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=186833.33333333334, ans=0.1 2023-11-18 10:36:37,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2023-11-18 10:36:39,212 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4000, loss[loss=0.08638, simple_loss=0.0803, pruned_loss=0.03001, audio_tagging_loss=0.01622, over 14445.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1276, pruned_loss=0.04198, audio_tagging_loss=0.01258, over 3041503.59 frames. ], batch size: 56, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:36:40,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-18 10:36:46,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=186966.66666666666, ans=0.5 2023-11-18 10:37:13,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=187166.66666666666, ans=0.0 2023-11-18 10:37:34,004 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4050, loss[loss=0.1157, simple_loss=0.133, pruned_loss=0.03564, audio_tagging_loss=0.01351, over 15529.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1291, pruned_loss=0.04249, audio_tagging_loss=0.01252, over 3045119.66 frames. ], batch size: 57, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:37:37,185 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:37:43,260 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:37:50,418 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=12.0 2023-11-18 10:38:00,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 9.492e+01 1.077e+02 1.184e+02 1.546e+02, threshold=2.154e+02, percent-clipped=0.0 2023-11-18 10:38:03,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-11-18 10:38:08,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=187500.0, ans=0.125 2023-11-18 10:38:16,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=187500.0, ans=0.0 2023-11-18 10:38:20,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=187566.66666666666, ans=0.2 2023-11-18 10:38:21,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=187566.66666666666, ans=0.07 2023-11-18 10:38:28,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=187633.33333333334, ans=0.1 2023-11-18 10:38:30,037 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4100, loss[loss=0.1312, simple_loss=0.1539, pruned_loss=0.04577, audio_tagging_loss=0.008476, over 16070.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1299, pruned_loss=0.04283, audio_tagging_loss=0.01243, over 3048424.28 frames. ], batch size: 57, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:38:42,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187700.0, ans=0.1 2023-11-18 10:38:47,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=187700.0, ans=0.2 2023-11-18 10:38:54,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=187766.66666666666, ans=0.125 2023-11-18 10:39:15,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=187900.0, ans=0.0 2023-11-18 10:39:19,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187900.0, ans=0.1 2023-11-18 10:39:25,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-18 10:39:26,026 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4150, loss[loss=0.07897, simple_loss=0.08805, pruned_loss=0.0213, audio_tagging_loss=0.01365, over 15382.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1299, pruned_loss=0.0431, audio_tagging_loss=0.01224, over 3049315.04 frames. ], batch size: 59, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:39:27,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=187966.66666666666, ans=0.125 2023-11-18 10:39:27,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=187966.66666666666, ans=0.2 2023-11-18 10:39:32,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-11-18 10:39:32,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=187966.66666666666, ans=0.0 2023-11-18 10:39:43,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-11-18 10:39:50,383 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.052e+01 9.719e+01 1.039e+02 1.185e+02 1.497e+02, threshold=2.077e+02, percent-clipped=0.0 2023-11-18 10:39:51,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=188100.0, ans=0.125 2023-11-18 10:39:59,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=188166.66666666666, ans=0.125 2023-11-18 10:40:07,369 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:40:21,052 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4200, loss[loss=0.1159, simple_loss=0.1254, pruned_loss=0.04129, audio_tagging_loss=0.01195, over 14940.00 frames. ], tot_loss[loss=0.1211, simple_loss=0.1313, pruned_loss=0.04338, audio_tagging_loss=0.01207, over 3045485.23 frames. ], batch size: 56, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:40:36,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=188366.66666666666, ans=0.125 2023-11-18 10:40:43,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=188433.33333333334, ans=0.125 2023-11-18 10:40:49,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=188433.33333333334, ans=0.1 2023-11-18 10:40:53,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=188500.0, ans=0.0 2023-11-18 10:41:10,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=188566.66666666666, ans=0.125 2023-11-18 10:41:10,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=188566.66666666666, ans=0.0 2023-11-18 10:41:15,116 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4250, loss[loss=0.126, simple_loss=0.142, pruned_loss=0.04058, audio_tagging_loss=0.01437, over 14717.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.1309, pruned_loss=0.04307, audio_tagging_loss=0.01212, over 3045883.42 frames. ], batch size: 56, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:41:16,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=188633.33333333334, ans=0.0 2023-11-18 10:41:19,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=188633.33333333334, ans=0.125 2023-11-18 10:41:20,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=188633.33333333334, ans=0.125 2023-11-18 10:41:24,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-18 10:41:35,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=188700.0, ans=0.0 2023-11-18 10:41:41,658 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.813e+01 1.062e+02 1.234e+02 2.396e+02, threshold=2.125e+02, percent-clipped=1.0 2023-11-18 10:41:42,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=188766.66666666666, ans=0.125 2023-11-18 10:41:52,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=188833.33333333334, ans=0.0 2023-11-18 10:42:01,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.60 vs. limit=15.0 2023-11-18 10:42:03,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2023-11-18 10:42:12,234 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4300, loss[loss=0.1396, simple_loss=0.1577, pruned_loss=0.05024, audio_tagging_loss=0.0105, over 15931.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.13, pruned_loss=0.04261, audio_tagging_loss=0.01206, over 3049742.30 frames. ], batch size: 59, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:42:24,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=189033.33333333334, ans=0.125 2023-11-18 10:42:40,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=189100.0, ans=0.2 2023-11-18 10:43:07,145 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4350, loss[loss=0.1321, simple_loss=0.1504, pruned_loss=0.04997, audio_tagging_loss=0.006927, over 15114.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1295, pruned_loss=0.04229, audio_tagging_loss=0.01203, over 3046352.57 frames. ], batch size: 55, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:43:12,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=189300.0, ans=0.0 2023-11-18 10:43:28,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2023-11-18 10:43:30,861 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-18 10:43:33,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.769e+01 1.106e+02 1.188e+02 1.814e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 10:44:01,989 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4400, loss[loss=0.1397, simple_loss=0.1518, pruned_loss=0.05179, audio_tagging_loss=0.01203, over 16677.00 frames. ], tot_loss[loss=0.1198, simple_loss=0.1302, pruned_loss=0.04266, audio_tagging_loss=0.01207, over 3039202.05 frames. ], batch size: 62, lr: 2.12e-02, grad_scale: 64.0 2023-11-18 10:44:14,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=189700.0, ans=0.2 2023-11-18 10:44:18,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=189700.0, ans=0.125 2023-11-18 10:44:22,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=189700.0, ans=0.125 2023-11-18 10:44:24,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2023-11-18 10:44:28,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=189766.66666666666, ans=0.02 2023-11-18 10:44:58,516 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4450, loss[loss=0.07842, simple_loss=0.08251, pruned_loss=0.02212, audio_tagging_loss=0.01505, over 13502.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1304, pruned_loss=0.04297, audio_tagging_loss=0.01203, over 3042960.69 frames. ], batch size: 53, lr: 2.12e-02, grad_scale: 64.0 2023-11-18 10:45:21,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=190100.0, ans=0.125 2023-11-18 10:45:24,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.777e+01 1.062e+02 1.165e+02 1.734e+02, threshold=2.124e+02, percent-clipped=0.0 2023-11-18 10:45:46,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=190233.33333333334, ans=0.0 2023-11-18 10:45:49,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=190233.33333333334, ans=0.125 2023-11-18 10:45:51,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=190233.33333333334, ans=0.125 2023-11-18 10:45:53,622 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4500, loss[loss=0.09586, simple_loss=0.1073, pruned_loss=0.03195, audio_tagging_loss=0.01028, over 15002.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1305, pruned_loss=0.04277, audio_tagging_loss=0.01189, over 3040300.81 frames. ], batch size: 56, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:45:54,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=15.0 2023-11-18 10:45:56,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.24 vs. limit=22.5 2023-11-18 10:46:28,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=190500.0, ans=0.0 2023-11-18 10:46:32,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=190500.0, ans=0.2 2023-11-18 10:46:48,218 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4550, loss[loss=0.1403, simple_loss=0.15, pruned_loss=0.05608, audio_tagging_loss=0.009205, over 15069.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1286, pruned_loss=0.04218, audio_tagging_loss=0.01198, over 3042995.75 frames. ], batch size: 55, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:47:12,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=190766.66666666666, ans=0.025 2023-11-18 10:47:15,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.411e+01 1.047e+02 1.182e+02 1.787e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 10:47:20,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.49 vs. limit=10.0 2023-11-18 10:47:26,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=190833.33333333334, ans=0.125 2023-11-18 10:47:28,929 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.41 vs. limit=15.0 2023-11-18 10:47:30,614 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:47:42,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=190900.0, ans=0.0 2023-11-18 10:47:44,415 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4600, loss[loss=0.1417, simple_loss=0.1638, pruned_loss=0.05011, audio_tagging_loss=0.009619, over 15615.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1285, pruned_loss=0.04222, audio_tagging_loss=0.012, over 3040798.25 frames. ], batch size: 57, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:47:57,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=191033.33333333334, ans=0.125 2023-11-18 10:47:58,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=191033.33333333334, ans=0.1 2023-11-18 10:48:09,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2023-11-18 10:48:12,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=191100.0, ans=0.1 2023-11-18 10:48:28,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2023-11-18 10:48:30,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=191233.33333333334, ans=0.0 2023-11-18 10:48:33,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=191233.33333333334, ans=0.125 2023-11-18 10:48:40,140 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4650, loss[loss=0.1273, simple_loss=0.1383, pruned_loss=0.04867, audio_tagging_loss=0.009441, over 14372.00 frames. ], tot_loss[loss=0.119, simple_loss=0.1294, pruned_loss=0.04231, audio_tagging_loss=0.01204, over 3039974.37 frames. ], batch size: 53, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:48:43,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=191300.0, ans=0.04949747468305833 2023-11-18 10:48:50,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.37 vs. limit=15.0 2023-11-18 10:49:06,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 9.958e+01 1.111e+02 1.228e+02 2.300e+02, threshold=2.222e+02, percent-clipped=1.0 2023-11-18 10:49:12,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=191500.0, ans=10.0 2023-11-18 10:49:12,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=191500.0, ans=0.04949747468305833 2023-11-18 10:49:24,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=191566.66666666666, ans=0.125 2023-11-18 10:49:26,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=191566.66666666666, ans=0.0 2023-11-18 10:49:34,885 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4700, loss[loss=0.1239, simple_loss=0.1384, pruned_loss=0.03968, audio_tagging_loss=0.01504, over 15175.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1301, pruned_loss=0.04282, audio_tagging_loss=0.01207, over 3046951.47 frames. ], batch size: 57, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:49:43,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=191633.33333333334, ans=0.125 2023-11-18 10:49:49,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=191700.0, ans=0.125 2023-11-18 10:50:03,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=191766.66666666666, ans=0.0 2023-11-18 10:50:05,179 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-18 10:50:12,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-18 10:50:23,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191900.0, ans=0.1 2023-11-18 10:50:30,212 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4750, loss[loss=0.135, simple_loss=0.1444, pruned_loss=0.05239, audio_tagging_loss=0.01044, over 14198.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1295, pruned_loss=0.04265, audio_tagging_loss=0.01214, over 3044346.56 frames. ], batch size: 54, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:50:34,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.19 vs. limit=15.0 2023-11-18 10:50:42,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=192033.33333333334, ans=0.125 2023-11-18 10:50:57,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.880e+01 1.110e+02 1.323e+02 1.950e+02, threshold=2.220e+02, percent-clipped=0.0 2023-11-18 10:51:24,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=192233.33333333334, ans=0.125 2023-11-18 10:51:26,446 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4800, loss[loss=0.1012, simple_loss=0.1148, pruned_loss=0.03082, audio_tagging_loss=0.01296, over 16484.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1296, pruned_loss=0.04245, audio_tagging_loss=0.01228, over 3048113.25 frames. ], batch size: 61, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:51:40,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=192366.66666666666, ans=0.125 2023-11-18 10:51:44,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=22.5 2023-11-18 10:51:47,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.08 vs. limit=10.0 2023-11-18 10:51:59,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=192500.0, ans=0.2 2023-11-18 10:52:05,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=192500.0, ans=0.125 2023-11-18 10:52:07,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=192500.0, ans=0.1 2023-11-18 10:52:18,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=192566.66666666666, ans=0.0 2023-11-18 10:52:21,050 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4850, loss[loss=0.1012, simple_loss=0.1167, pruned_loss=0.03242, audio_tagging_loss=0.01047, over 15449.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1282, pruned_loss=0.04194, audio_tagging_loss=0.01241, over 3051493.42 frames. ], batch size: 57, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:52:23,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=192633.33333333334, ans=0.0 2023-11-18 10:52:24,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=192633.33333333334, ans=0.0 2023-11-18 10:52:34,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.15 vs. limit=15.0 2023-11-18 10:52:37,074 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:52:44,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=192766.66666666666, ans=0.125 2023-11-18 10:52:47,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 9.557e+01 1.060e+02 1.196e+02 2.281e+02, threshold=2.120e+02, percent-clipped=1.0 2023-11-18 10:53:15,991 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4900, loss[loss=0.1026, simple_loss=0.1082, pruned_loss=0.03636, audio_tagging_loss=0.01211, over 15008.00 frames. ], tot_loss[loss=0.1183, simple_loss=0.1279, pruned_loss=0.04189, audio_tagging_loss=0.01249, over 3056386.48 frames. ], batch size: 57, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:53:18,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=192966.66666666666, ans=0.0 2023-11-18 10:53:42,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=193100.0, ans=0.125 2023-11-18 10:53:47,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=193100.0, ans=0.125 2023-11-18 10:53:50,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.58 vs. limit=10.0 2023-11-18 10:53:52,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=193166.66666666666, ans=0.0 2023-11-18 10:53:57,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193166.66666666666, ans=0.1 2023-11-18 10:54:11,449 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 4950, loss[loss=0.123, simple_loss=0.1243, pruned_loss=0.04432, audio_tagging_loss=0.01649, over 15638.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1281, pruned_loss=0.04206, audio_tagging_loss=0.01226, over 3053256.17 frames. ], batch size: 58, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:54:30,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=193366.66666666666, ans=0.2 2023-11-18 10:54:37,971 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 9.494e+01 1.131e+02 1.249e+02 1.755e+02, threshold=2.261e+02, percent-clipped=0.0 2023-11-18 10:54:40,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=193433.33333333334, ans=0.125 2023-11-18 10:54:54,841 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:54:59,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=193566.66666666666, ans=0.0 2023-11-18 10:55:06,963 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5000, loss[loss=0.07131, simple_loss=0.07403, pruned_loss=0.02075, audio_tagging_loss=0.01355, over 16184.00 frames. ], tot_loss[loss=0.1188, simple_loss=0.129, pruned_loss=0.0422, audio_tagging_loss=0.01208, over 3052588.35 frames. ], batch size: 63, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:55:07,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.00 vs. limit=22.5 2023-11-18 10:55:11,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.10 vs. limit=12.0 2023-11-18 10:55:12,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.35 vs. limit=22.5 2023-11-18 10:55:21,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=193700.0, ans=0.2 2023-11-18 10:55:23,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=193700.0, ans=0.125 2023-11-18 10:55:24,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.53 vs. limit=10.0 2023-11-18 10:55:50,104 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:56:02,101 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5050, loss[loss=0.1037, simple_loss=0.1133, pruned_loss=0.0376, audio_tagging_loss=0.009464, over 14090.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1285, pruned_loss=0.04213, audio_tagging_loss=0.01205, over 3045624.98 frames. ], batch size: 55, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:56:03,670 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2023-11-18 10:56:19,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=194033.33333333334, ans=0.07 2023-11-18 10:56:20,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=22.5 2023-11-18 10:56:28,903 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 1.006e+02 1.111e+02 1.230e+02 2.145e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 10:56:29,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=194100.0, ans=0.0 2023-11-18 10:56:34,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=194166.66666666666, ans=0.0 2023-11-18 10:56:35,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=194166.66666666666, ans=0.125 2023-11-18 10:56:56,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=194233.33333333334, ans=0.07 2023-11-18 10:56:57,806 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5100, loss[loss=0.09149, simple_loss=0.09047, pruned_loss=0.03121, audio_tagging_loss=0.01505, over 14470.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1265, pruned_loss=0.04116, audio_tagging_loss=0.01214, over 3041147.63 frames. ], batch size: 56, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:57:15,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=194366.66666666666, ans=0.0 2023-11-18 10:57:24,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=194433.33333333334, ans=0.1 2023-11-18 10:57:27,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-11-18 10:57:32,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=194500.0, ans=0.125 2023-11-18 10:57:52,359 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5150, loss[loss=0.09922, simple_loss=0.1078, pruned_loss=0.03267, audio_tagging_loss=0.01266, over 15025.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1269, pruned_loss=0.04125, audio_tagging_loss=0.01219, over 3046048.31 frames. ], batch size: 57, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:57:57,292 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:58:02,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=194700.0, ans=0.0 2023-11-18 10:58:05,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=194700.0, ans=0.125 2023-11-18 10:58:20,056 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 9.533e+01 1.047e+02 1.145e+02 1.744e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 10:58:33,985 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:58:35,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=12.0 2023-11-18 10:58:48,374 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5200, loss[loss=0.1543, simple_loss=0.167, pruned_loss=0.05762, audio_tagging_loss=0.01321, over 15882.00 frames. ], tot_loss[loss=0.1172, simple_loss=0.1275, pruned_loss=0.04144, audio_tagging_loss=0.01206, over 3046348.91 frames. ], batch size: 59, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:59:24,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.18 vs. limit=22.5 2023-11-18 10:59:25,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=195166.66666666666, ans=0.1 2023-11-18 10:59:40,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=195233.33333333334, ans=0.2 2023-11-18 10:59:41,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=195233.33333333334, ans=0.125 2023-11-18 10:59:41,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=15.0 2023-11-18 10:59:44,061 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5250, loss[loss=0.1704, simple_loss=0.1857, pruned_loss=0.06718, audio_tagging_loss=0.01037, over 15976.00 frames. ], tot_loss[loss=0.1171, simple_loss=0.1272, pruned_loss=0.04138, audio_tagging_loss=0.01208, over 3044880.70 frames. ], batch size: 57, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:59:47,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=195300.0, ans=0.125 2023-11-18 10:59:58,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-11-18 11:00:09,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 9.706e+01 1.086e+02 1.165e+02 1.723e+02, threshold=2.171e+02, percent-clipped=0.0 2023-11-18 11:00:11,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=195433.33333333334, ans=22.5 2023-11-18 11:00:13,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=195433.33333333334, ans=0.04949747468305833 2023-11-18 11:00:27,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=195566.66666666666, ans=0.125 2023-11-18 11:00:33,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=195566.66666666666, ans=0.05 2023-11-18 11:00:35,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=195566.66666666666, ans=0.125 2023-11-18 11:00:38,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-11-18 11:00:38,722 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5300, loss[loss=0.1186, simple_loss=0.112, pruned_loss=0.04571, audio_tagging_loss=0.01692, over 15172.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1278, pruned_loss=0.04186, audio_tagging_loss=0.01208, over 3048845.76 frames. ], batch size: 59, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 11:00:42,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=195633.33333333334, ans=0.0 2023-11-18 11:00:56,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=195700.0, ans=0.0 2023-11-18 11:01:12,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=195833.33333333334, ans=0.02 2023-11-18 11:01:33,818 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5350, loss[loss=0.1198, simple_loss=0.1258, pruned_loss=0.04374, audio_tagging_loss=0.01315, over 15290.00 frames. ], tot_loss[loss=0.1183, simple_loss=0.1286, pruned_loss=0.04194, audio_tagging_loss=0.01209, over 3045546.38 frames. ], batch size: 56, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:01:38,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=195966.66666666666, ans=0.125 2023-11-18 11:01:41,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=195966.66666666666, ans=0.09899494936611666 2023-11-18 11:01:42,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=195966.66666666666, ans=0.125 2023-11-18 11:01:47,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-11-18 11:02:00,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 9.740e+01 1.103e+02 1.236e+02 1.942e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 11:02:30,252 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5400, loss[loss=0.1095, simple_loss=0.127, pruned_loss=0.03419, audio_tagging_loss=0.0118, over 14578.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1287, pruned_loss=0.04188, audio_tagging_loss=0.01219, over 3042847.82 frames. ], batch size: 54, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:02:30,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=196300.0, ans=0.125 2023-11-18 11:02:46,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=196366.66666666666, ans=0.0 2023-11-18 11:03:24,824 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5450, loss[loss=0.1169, simple_loss=0.1262, pruned_loss=0.04248, audio_tagging_loss=0.01134, over 15587.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1289, pruned_loss=0.04178, audio_tagging_loss=0.01215, over 3041846.35 frames. ], batch size: 60, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:03:34,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=196700.0, ans=0.0 2023-11-18 11:03:51,182 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 9.475e+01 1.043e+02 1.232e+02 1.692e+02, threshold=2.085e+02, percent-clipped=0.0 2023-11-18 11:03:54,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=196766.66666666666, ans=0.125 2023-11-18 11:03:57,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=196833.33333333334, ans=0.125 2023-11-18 11:03:58,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=196833.33333333334, ans=0.1 2023-11-18 11:04:04,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-18 11:04:19,144 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5500, loss[loss=0.09513, simple_loss=0.1045, pruned_loss=0.03204, audio_tagging_loss=0.01084, over 15332.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1276, pruned_loss=0.04131, audio_tagging_loss=0.01221, over 3037030.14 frames. ], batch size: 59, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:05:15,100 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5550, loss[loss=0.1033, simple_loss=0.1129, pruned_loss=0.03481, audio_tagging_loss=0.01209, over 15664.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.128, pruned_loss=0.04154, audio_tagging_loss=0.01235, over 3037160.40 frames. ], batch size: 58, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:05:37,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=197433.33333333334, ans=0.0 2023-11-18 11:05:41,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 9.422e+01 1.021e+02 1.116e+02 1.524e+02, threshold=2.042e+02, percent-clipped=0.0 2023-11-18 11:05:45,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197433.33333333334, ans=0.1 2023-11-18 11:05:55,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=197500.0, ans=0.2 2023-11-18 11:05:55,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=197500.0, ans=0.0 2023-11-18 11:06:05,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=197566.66666666666, ans=0.125 2023-11-18 11:06:10,798 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5600, loss[loss=0.09989, simple_loss=0.1073, pruned_loss=0.02976, audio_tagging_loss=0.01647, over 14892.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1269, pruned_loss=0.04104, audio_tagging_loss=0.01253, over 3043439.55 frames. ], batch size: 56, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:06:12,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=197633.33333333334, ans=0.125 2023-11-18 11:06:15,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=197633.33333333334, ans=0.0 2023-11-18 11:06:45,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.18 vs. limit=15.0 2023-11-18 11:06:49,903 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:06:51,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=197833.33333333334, ans=0.0 2023-11-18 11:06:52,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=197833.33333333334, ans=0.0 2023-11-18 11:06:55,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=197900.0, ans=0.04949747468305833 2023-11-18 11:07:05,550 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5650, loss[loss=0.1027, simple_loss=0.107, pruned_loss=0.03009, audio_tagging_loss=0.01907, over 14317.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1268, pruned_loss=0.04101, audio_tagging_loss=0.01246, over 3044222.02 frames. ], batch size: 55, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:07:07,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=197966.66666666666, ans=0.015 2023-11-18 11:07:24,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-11-18 11:07:32,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 9.472e+01 1.054e+02 1.179e+02 1.784e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 11:07:37,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=198166.66666666666, ans=0.2 2023-11-18 11:07:47,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.46 vs. limit=10.0 2023-11-18 11:07:57,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=198233.33333333334, ans=0.125 2023-11-18 11:08:01,316 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5700, loss[loss=0.11, simple_loss=0.1302, pruned_loss=0.03347, audio_tagging_loss=0.01145, over 15400.00 frames. ], tot_loss[loss=0.116, simple_loss=0.126, pruned_loss=0.0406, audio_tagging_loss=0.01245, over 3049875.88 frames. ], batch size: 56, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:08:14,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=198366.66666666666, ans=0.125 2023-11-18 11:08:17,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=12.0 2023-11-18 11:08:32,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=198500.0, ans=0.0 2023-11-18 11:08:43,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-18 11:08:47,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=198566.66666666666, ans=0.125 2023-11-18 11:08:52,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=198566.66666666666, ans=0.2 2023-11-18 11:08:56,301 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5750, loss[loss=0.1052, simple_loss=0.1128, pruned_loss=0.03985, audio_tagging_loss=0.008985, over 14630.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1261, pruned_loss=0.04062, audio_tagging_loss=0.01234, over 3046942.00 frames. ], batch size: 54, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:09:10,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=198700.0, ans=0.125 2023-11-18 11:09:22,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 9.937e+01 1.145e+02 1.295e+02 2.386e+02, threshold=2.290e+02, percent-clipped=2.0 2023-11-18 11:09:29,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=198833.33333333334, ans=0.125 2023-11-18 11:09:50,850 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5800, loss[loss=0.09893, simple_loss=0.1019, pruned_loss=0.03467, audio_tagging_loss=0.01332, over 15950.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1257, pruned_loss=0.0406, audio_tagging_loss=0.01216, over 3048788.79 frames. ], batch size: 62, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:09:52,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=198966.66666666666, ans=0.125 2023-11-18 11:09:59,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=198966.66666666666, ans=0.125 2023-11-18 11:10:31,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=199166.66666666666, ans=0.125 2023-11-18 11:10:44,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199233.33333333334, ans=0.1 2023-11-18 11:10:45,924 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5850, loss[loss=0.149, simple_loss=0.1698, pruned_loss=0.05552, audio_tagging_loss=0.008523, over 15958.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.125, pruned_loss=0.04063, audio_tagging_loss=0.01214, over 3049998.71 frames. ], batch size: 57, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:10:48,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=199300.0, ans=0.125 2023-11-18 11:11:03,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.51 vs. limit=15.0 2023-11-18 11:11:12,884 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.939e+01 1.135e+02 1.295e+02 1.954e+02, threshold=2.270e+02, percent-clipped=0.0 2023-11-18 11:11:17,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-18 11:11:18,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=199500.0, ans=0.2 2023-11-18 11:11:38,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=199566.66666666666, ans=0.125 2023-11-18 11:11:42,582 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5900, loss[loss=0.1204, simple_loss=0.1336, pruned_loss=0.0411, audio_tagging_loss=0.01248, over 15509.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1258, pruned_loss=0.04088, audio_tagging_loss=0.01209, over 3053273.23 frames. ], batch size: 58, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:11:44,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-18 11:11:44,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=199633.33333333334, ans=0.1 2023-11-18 11:11:58,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=199700.0, ans=0.125 2023-11-18 11:12:06,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=199766.66666666666, ans=0.1 2023-11-18 11:12:10,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=199766.66666666666, ans=0.125 2023-11-18 11:12:13,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=199766.66666666666, ans=0.125 2023-11-18 11:12:24,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=199833.33333333334, ans=0.1 2023-11-18 11:12:25,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=199900.0, ans=0.2 2023-11-18 11:12:35,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199900.0, ans=0.1 2023-11-18 11:12:36,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=199966.66666666666, ans=0.125 2023-11-18 11:12:37,049 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 5950, loss[loss=0.1107, simple_loss=0.1157, pruned_loss=0.04019, audio_tagging_loss=0.01271, over 14954.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1261, pruned_loss=0.04077, audio_tagging_loss=0.01202, over 3053026.07 frames. ], batch size: 56, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:12:42,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=199966.66666666666, ans=0.125 2023-11-18 11:12:43,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=199966.66666666666, ans=0.125 2023-11-18 11:12:47,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=200033.33333333334, ans=0.125 2023-11-18 11:13:04,192 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.491e+01 1.040e+02 1.180e+02 1.802e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 11:13:15,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=200166.66666666666, ans=0.04949747468305833 2023-11-18 11:13:24,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.88 vs. limit=22.5 2023-11-18 11:13:26,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.13 vs. limit=10.0 2023-11-18 11:13:31,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=200300.0, ans=0.0 2023-11-18 11:13:32,553 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6000, loss[loss=0.1222, simple_loss=0.1254, pruned_loss=0.04432, audio_tagging_loss=0.01513, over 14827.00 frames. ], tot_loss[loss=0.1171, simple_loss=0.1279, pruned_loss=0.04126, audio_tagging_loss=0.01186, over 3054355.41 frames. ], batch size: 56, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:13:32,554 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 11:13:46,922 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.6069, 3.2991, 3.5640, 3.0100], device='cuda:3') 2023-11-18 11:14:05,624 INFO [train_asr.py:1147] (3/4) Epoch 3, validation: loss=0.08054, simple_loss=0.06533, pruned_loss=0.01225, audio_tagging_loss=0.03562, over 4681554.00 frames. 2023-11-18 11:14:05,625 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 11:14:31,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=200433.33333333334, ans=0.125 2023-11-18 11:14:45,165 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:14:45,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=200500.0, ans=0.125 2023-11-18 11:15:00,402 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6050, loss[loss=0.1108, simple_loss=0.1224, pruned_loss=0.03868, audio_tagging_loss=0.01092, over 14573.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1273, pruned_loss=0.04103, audio_tagging_loss=0.01187, over 3046755.55 frames. ], batch size: 54, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:15:17,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.71 vs. limit=22.5 2023-11-18 11:15:27,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.505e+01 1.054e+02 1.196e+02 1.657e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 11:15:27,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=200766.66666666666, ans=0.5 2023-11-18 11:15:36,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=200833.33333333334, ans=0.125 2023-11-18 11:15:36,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2023-11-18 11:15:41,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.45 vs. limit=22.5 2023-11-18 11:15:50,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=200900.0, ans=0.125 2023-11-18 11:15:55,688 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6100, loss[loss=0.1253, simple_loss=0.1404, pruned_loss=0.04463, audio_tagging_loss=0.01047, over 16391.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1267, pruned_loss=0.04089, audio_tagging_loss=0.01192, over 3047844.88 frames. ], batch size: 61, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:15:57,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=200966.66666666666, ans=0.0 2023-11-18 11:15:59,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=200966.66666666666, ans=0.125 2023-11-18 11:16:09,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=22.5 2023-11-18 11:16:18,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=201100.0, ans=0.0 2023-11-18 11:16:21,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=201100.0, ans=0.125 2023-11-18 11:16:26,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201100.0, ans=0.1 2023-11-18 11:16:38,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=201166.66666666666, ans=0.125 2023-11-18 11:16:51,860 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6150, loss[loss=0.09865, simple_loss=0.1091, pruned_loss=0.03215, audio_tagging_loss=0.01194, over 15547.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1264, pruned_loss=0.04094, audio_tagging_loss=0.01193, over 3049777.31 frames. ], batch size: 56, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:16:53,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=201300.0, ans=0.125 2023-11-18 11:16:56,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=201300.0, ans=0.09899494936611666 2023-11-18 11:17:17,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=201433.33333333334, ans=0.0 2023-11-18 11:17:18,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.824e+01 1.100e+02 1.227e+02 1.879e+02, threshold=2.200e+02, percent-clipped=0.0 2023-11-18 11:17:22,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201433.33333333334, ans=0.1 2023-11-18 11:17:24,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=201500.0, ans=0.125 2023-11-18 11:17:26,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=201500.0, ans=0.025 2023-11-18 11:17:28,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=201500.0, ans=0.125 2023-11-18 11:17:28,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2023-11-18 11:17:29,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.90 vs. limit=15.0 2023-11-18 11:17:47,682 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6200, loss[loss=0.09358, simple_loss=0.09551, pruned_loss=0.03086, audio_tagging_loss=0.01497, over 15582.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.127, pruned_loss=0.04116, audio_tagging_loss=0.01205, over 3052803.77 frames. ], batch size: 60, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:18:03,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=201700.0, ans=0.04949747468305833 2023-11-18 11:18:10,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201766.66666666666, ans=0.1 2023-11-18 11:18:19,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=201766.66666666666, ans=0.125 2023-11-18 11:18:31,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=201900.0, ans=0.125 2023-11-18 11:18:43,323 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6250, loss[loss=0.1333, simple_loss=0.1291, pruned_loss=0.05158, audio_tagging_loss=0.01718, over 14583.00 frames. ], tot_loss[loss=0.118, simple_loss=0.1283, pruned_loss=0.04165, audio_tagging_loss=0.01223, over 3045755.12 frames. ], batch size: 55, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:18:58,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=202033.33333333334, ans=0.0 2023-11-18 11:19:03,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=202033.33333333334, ans=0.125 2023-11-18 11:19:10,111 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 9.428e+01 1.017e+02 1.154e+02 1.739e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 11:19:39,067 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6300, loss[loss=0.09073, simple_loss=0.1015, pruned_loss=0.02856, audio_tagging_loss=0.01142, over 15975.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1292, pruned_loss=0.04215, audio_tagging_loss=0.01232, over 3047143.96 frames. ], batch size: 62, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:19:41,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202300.0, ans=0.1 2023-11-18 11:19:41,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=202300.0, ans=0.125 2023-11-18 11:20:34,518 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6350, loss[loss=0.1248, simple_loss=0.1317, pruned_loss=0.04399, audio_tagging_loss=0.01501, over 14656.00 frames. ], tot_loss[loss=0.119, simple_loss=0.1289, pruned_loss=0.04214, audio_tagging_loss=0.01243, over 3051942.52 frames. ], batch size: 57, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:20:37,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.24 vs. limit=15.0 2023-11-18 11:20:38,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202633.33333333334, ans=0.1 2023-11-18 11:20:38,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.71 vs. limit=22.5 2023-11-18 11:20:44,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=202700.0, ans=15.0 2023-11-18 11:20:46,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=202700.0, ans=0.0 2023-11-18 11:20:53,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2023-11-18 11:20:54,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2023-11-18 11:20:54,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=202700.0, ans=0.95 2023-11-18 11:21:01,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.831e+01 1.084e+02 1.220e+02 1.699e+02, threshold=2.169e+02, percent-clipped=0.0 2023-11-18 11:21:01,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202766.66666666666, ans=0.1 2023-11-18 11:21:04,237 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=12.0 2023-11-18 11:21:17,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=202833.33333333334, ans=0.125 2023-11-18 11:21:28,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=202900.0, ans=0.0 2023-11-18 11:21:29,927 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6400, loss[loss=0.1056, simple_loss=0.1189, pruned_loss=0.03356, audio_tagging_loss=0.01262, over 14662.00 frames. ], tot_loss[loss=0.118, simple_loss=0.1279, pruned_loss=0.04156, audio_tagging_loss=0.01248, over 3050327.69 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:21:32,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=202966.66666666666, ans=0.0 2023-11-18 11:21:44,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203033.33333333334, ans=0.1 2023-11-18 11:21:47,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2023-11-18 11:21:47,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=203033.33333333334, ans=0.125 2023-11-18 11:21:54,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=203100.0, ans=0.2 2023-11-18 11:22:03,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=203166.66666666666, ans=0.125 2023-11-18 11:22:04,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203166.66666666666, ans=0.1 2023-11-18 11:22:25,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=203300.0, ans=0.0 2023-11-18 11:22:25,997 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6450, loss[loss=0.1375, simple_loss=0.1545, pruned_loss=0.04987, audio_tagging_loss=0.01031, over 14866.00 frames. ], tot_loss[loss=0.1172, simple_loss=0.1269, pruned_loss=0.04117, audio_tagging_loss=0.01253, over 3051926.76 frames. ], batch size: 54, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:22:26,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=203300.0, ans=0.0 2023-11-18 11:22:35,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=203366.66666666666, ans=0.0 2023-11-18 11:22:37,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=203366.66666666666, ans=0.1 2023-11-18 11:22:52,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 9.763e+01 1.082e+02 1.171e+02 1.453e+02, threshold=2.164e+02, percent-clipped=0.0 2023-11-18 11:23:21,052 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6500, loss[loss=0.1495, simple_loss=0.1625, pruned_loss=0.05609, audio_tagging_loss=0.0122, over 16478.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.1278, pruned_loss=0.04144, audio_tagging_loss=0.01233, over 3051993.09 frames. ], batch size: 60, lr: 2.05e-02, grad_scale: 64.0 2023-11-18 11:23:23,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203633.33333333334, ans=0.1 2023-11-18 11:23:33,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=203700.0, ans=0.125 2023-11-18 11:23:36,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.18 vs. limit=10.0 2023-11-18 11:23:37,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2023-11-18 11:23:46,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=203766.66666666666, ans=0.125 2023-11-18 11:24:11,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=203900.0, ans=0.1 2023-11-18 11:24:17,219 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6550, loss[loss=0.1422, simple_loss=0.1557, pruned_loss=0.05362, audio_tagging_loss=0.01076, over 14982.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1286, pruned_loss=0.04166, audio_tagging_loss=0.01213, over 3053180.82 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 64.0 2023-11-18 11:24:43,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 9.621e+01 1.067e+02 1.227e+02 1.729e+02, threshold=2.134e+02, percent-clipped=0.0 2023-11-18 11:24:47,266 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:24:47,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.09 vs. limit=15.0 2023-11-18 11:24:53,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=204166.66666666666, ans=0.125 2023-11-18 11:24:55,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.54 vs. limit=12.0 2023-11-18 11:25:12,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2023-11-18 11:25:13,364 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6600, loss[loss=0.1014, simple_loss=0.1226, pruned_loss=0.03038, audio_tagging_loss=0.009656, over 15512.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.1279, pruned_loss=0.04157, audio_tagging_loss=0.01205, over 3051759.25 frames. ], batch size: 56, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:25:15,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=204300.0, ans=0.0 2023-11-18 11:25:21,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=204300.0, ans=0.09899494936611666 2023-11-18 11:25:26,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=204366.66666666666, ans=0.125 2023-11-18 11:25:42,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=204433.33333333334, ans=0.0 2023-11-18 11:25:55,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=204500.0, ans=0.2 2023-11-18 11:26:00,262 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:26:07,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=204633.33333333334, ans=0.1 2023-11-18 11:26:08,405 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6650, loss[loss=0.09837, simple_loss=0.09517, pruned_loss=0.03738, audio_tagging_loss=0.01341, over 14298.00 frames. ], tot_loss[loss=0.1171, simple_loss=0.127, pruned_loss=0.04139, audio_tagging_loss=0.0122, over 3045914.71 frames. ], batch size: 53, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:26:09,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=204633.33333333334, ans=0.0 2023-11-18 11:26:23,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=204700.0, ans=0.125 2023-11-18 11:26:34,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=204766.66666666666, ans=0.125 2023-11-18 11:26:35,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 9.475e+01 1.025e+02 1.163e+02 1.869e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 11:26:36,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=204766.66666666666, ans=0.125 2023-11-18 11:26:44,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=204833.33333333334, ans=0.0 2023-11-18 11:26:48,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204833.33333333334, ans=0.1 2023-11-18 11:26:51,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=204900.0, ans=0.0 2023-11-18 11:26:52,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=204900.0, ans=0.125 2023-11-18 11:26:54,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=204900.0, ans=0.0 2023-11-18 11:26:56,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=204900.0, ans=0.125 2023-11-18 11:27:00,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=204900.0, ans=0.0 2023-11-18 11:27:03,167 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6700, loss[loss=0.1133, simple_loss=0.1205, pruned_loss=0.04166, audio_tagging_loss=0.01141, over 14518.00 frames. ], tot_loss[loss=0.1172, simple_loss=0.1276, pruned_loss=0.04131, audio_tagging_loss=0.0121, over 3043132.78 frames. ], batch size: 55, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:27:08,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=204966.66666666666, ans=10.0 2023-11-18 11:27:33,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=205100.0, ans=0.125 2023-11-18 11:27:59,109 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6750, loss[loss=0.1187, simple_loss=0.1211, pruned_loss=0.04649, audio_tagging_loss=0.01164, over 15098.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1265, pruned_loss=0.04109, audio_tagging_loss=0.01211, over 3041431.78 frames. ], batch size: 58, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:28:03,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=205300.0, ans=0.125 2023-11-18 11:28:25,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 9.647e+01 1.086e+02 1.295e+02 2.076e+02, threshold=2.172e+02, percent-clipped=1.0 2023-11-18 11:28:30,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=205433.33333333334, ans=0.0 2023-11-18 11:28:31,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=205500.0, ans=0.0 2023-11-18 11:28:37,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=205500.0, ans=0.2 2023-11-18 11:28:54,658 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6800, loss[loss=0.1299, simple_loss=0.1331, pruned_loss=0.04824, audio_tagging_loss=0.01514, over 16699.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1269, pruned_loss=0.04141, audio_tagging_loss=0.0121, over 3037024.70 frames. ], batch size: 63, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:28:55,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2023-11-18 11:28:58,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=205633.33333333334, ans=0.035 2023-11-18 11:29:04,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-11-18 11:29:12,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.53 vs. limit=22.5 2023-11-18 11:29:17,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=205766.66666666666, ans=0.0 2023-11-18 11:29:40,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=205900.0, ans=0.125 2023-11-18 11:29:40,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.01 vs. limit=15.0 2023-11-18 11:29:42,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=205900.0, ans=0.0 2023-11-18 11:29:49,785 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6850, loss[loss=0.1651, simple_loss=0.181, pruned_loss=0.0668, audio_tagging_loss=0.00778, over 15188.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.128, pruned_loss=0.04185, audio_tagging_loss=0.01189, over 3036292.89 frames. ], batch size: 55, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:29:57,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=205966.66666666666, ans=0.125 2023-11-18 11:30:07,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=206033.33333333334, ans=0.125 2023-11-18 11:30:17,594 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 9.332e+01 1.055e+02 1.143e+02 1.752e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 11:30:26,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=206166.66666666666, ans=0.035 2023-11-18 11:30:26,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.37 vs. limit=12.0 2023-11-18 11:30:37,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=206233.33333333334, ans=0.2 2023-11-18 11:30:43,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=206233.33333333334, ans=0.0 2023-11-18 11:30:45,650 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6900, loss[loss=0.0877, simple_loss=0.09526, pruned_loss=0.02566, audio_tagging_loss=0.01442, over 14465.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1266, pruned_loss=0.04091, audio_tagging_loss=0.01183, over 3036191.76 frames. ], batch size: 56, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:30:53,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=206300.0, ans=0.0 2023-11-18 11:31:05,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=206366.66666666666, ans=0.125 2023-11-18 11:31:11,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206433.33333333334, ans=0.1 2023-11-18 11:31:14,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.72 vs. limit=22.5 2023-11-18 11:31:17,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=206500.0, ans=0.0 2023-11-18 11:31:17,972 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2023-11-18 11:31:27,828 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:31:40,908 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 6950, loss[loss=0.1164, simple_loss=0.1195, pruned_loss=0.04165, audio_tagging_loss=0.01503, over 15132.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1257, pruned_loss=0.04052, audio_tagging_loss=0.01195, over 3038537.05 frames. ], batch size: 56, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:31:47,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=206633.33333333334, ans=0.1 2023-11-18 11:32:07,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.89 vs. limit=10.0 2023-11-18 11:32:08,295 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 9.386e+01 1.046e+02 1.149e+02 1.697e+02, threshold=2.092e+02, percent-clipped=0.0 2023-11-18 11:32:10,382 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-18 11:32:35,652 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7000, loss[loss=0.1423, simple_loss=0.1615, pruned_loss=0.04841, audio_tagging_loss=0.01313, over 16679.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1265, pruned_loss=0.0407, audio_tagging_loss=0.01204, over 3043192.01 frames. ], batch size: 60, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:32:35,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=206966.66666666666, ans=0.125 2023-11-18 11:33:02,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=207100.0, ans=0.125 2023-11-18 11:33:08,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=207166.66666666666, ans=0.1 2023-11-18 11:33:08,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=207166.66666666666, ans=0.0 2023-11-18 11:33:20,870 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2023-11-18 11:33:31,926 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7050, loss[loss=0.1319, simple_loss=0.148, pruned_loss=0.04829, audio_tagging_loss=0.009607, over 15620.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1258, pruned_loss=0.04017, audio_tagging_loss=0.01212, over 3041854.75 frames. ], batch size: 55, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:33:33,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=207300.0, ans=0.07 2023-11-18 11:33:40,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=207300.0, ans=0.0 2023-11-18 11:33:42,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=207366.66666666666, ans=0.2 2023-11-18 11:33:58,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.932e+01 9.545e+01 1.049e+02 1.197e+02 1.734e+02, threshold=2.097e+02, percent-clipped=0.0 2023-11-18 11:34:10,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=207500.0, ans=0.1 2023-11-18 11:34:17,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=207566.66666666666, ans=0.1 2023-11-18 11:34:25,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=207566.66666666666, ans=0.0 2023-11-18 11:34:27,344 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7100, loss[loss=0.1041, simple_loss=0.1151, pruned_loss=0.03188, audio_tagging_loss=0.01467, over 16220.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.127, pruned_loss=0.04099, audio_tagging_loss=0.01215, over 3044592.10 frames. ], batch size: 59, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:34:27,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2023-11-18 11:35:00,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2023-11-18 11:35:06,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=207833.33333333334, ans=0.1 2023-11-18 11:35:11,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2023-11-18 11:35:22,610 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7150, loss[loss=0.1314, simple_loss=0.1424, pruned_loss=0.04903, audio_tagging_loss=0.01116, over 15445.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1264, pruned_loss=0.04074, audio_tagging_loss=0.01218, over 3045061.04 frames. ], batch size: 57, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:35:33,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.51 vs. limit=15.0 2023-11-18 11:35:44,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=208100.0, ans=0.125 2023-11-18 11:35:48,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=208100.0, ans=0.125 2023-11-18 11:35:49,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=208100.0, ans=0.125 2023-11-18 11:35:49,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=208100.0, ans=0.125 2023-11-18 11:35:51,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.617e+01 1.079e+02 1.252e+02 1.872e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 11:36:00,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-18 11:36:01,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=208166.66666666666, ans=0.0 2023-11-18 11:36:02,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208166.66666666666, ans=0.1 2023-11-18 11:36:09,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=208233.33333333334, ans=0.0 2023-11-18 11:36:14,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=208233.33333333334, ans=0.125 2023-11-18 11:36:19,141 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7200, loss[loss=0.1066, simple_loss=0.1194, pruned_loss=0.03636, audio_tagging_loss=0.01053, over 16355.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1267, pruned_loss=0.0406, audio_tagging_loss=0.01219, over 3045175.58 frames. ], batch size: 61, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:36:32,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=208366.66666666666, ans=0.125 2023-11-18 11:36:34,252 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:36:44,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2023-11-18 11:37:00,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=208500.0, ans=0.125 2023-11-18 11:37:01,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=208500.0, ans=0.0 2023-11-18 11:37:15,130 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7250, loss[loss=0.1274, simple_loss=0.1444, pruned_loss=0.04514, audio_tagging_loss=0.01009, over 14868.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1267, pruned_loss=0.04062, audio_tagging_loss=0.01217, over 3045314.90 frames. ], batch size: 54, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:37:18,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2023-11-18 11:37:27,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=15.0 2023-11-18 11:37:28,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=208700.0, ans=0.0 2023-11-18 11:37:39,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=208766.66666666666, ans=0.125 2023-11-18 11:37:41,783 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 9.494e+01 1.040e+02 1.201e+02 1.952e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 11:38:09,853 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7300, loss[loss=0.1541, simple_loss=0.1837, pruned_loss=0.05208, audio_tagging_loss=0.01014, over 16715.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1271, pruned_loss=0.0407, audio_tagging_loss=0.01212, over 3043136.12 frames. ], batch size: 58, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:38:11,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=208966.66666666666, ans=0.07 2023-11-18 11:38:17,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=208966.66666666666, ans=0.0 2023-11-18 11:38:22,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.56 vs. limit=22.5 2023-11-18 11:38:35,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=209100.0, ans=0.125 2023-11-18 11:38:52,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=209166.66666666666, ans=0.2 2023-11-18 11:38:54,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=209233.33333333334, ans=0.2 2023-11-18 11:39:05,498 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7350, loss[loss=0.08218, simple_loss=0.08789, pruned_loss=0.02574, audio_tagging_loss=0.0125, over 15356.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1274, pruned_loss=0.04095, audio_tagging_loss=0.01194, over 3042343.31 frames. ], batch size: 60, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:39:14,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=209300.0, ans=0.125 2023-11-18 11:39:16,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=209366.66666666666, ans=0.2 2023-11-18 11:39:33,658 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 9.639e+01 1.055e+02 1.233e+02 1.941e+02, threshold=2.110e+02, percent-clipped=0.0 2023-11-18 11:39:34,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=209433.33333333334, ans=0.125 2023-11-18 11:39:39,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=209500.0, ans=0.125 2023-11-18 11:39:49,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=209566.66666666666, ans=0.125 2023-11-18 11:39:50,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=209566.66666666666, ans=6.0 2023-11-18 11:39:51,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=209566.66666666666, ans=0.125 2023-11-18 11:40:01,552 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7400, loss[loss=0.09471, simple_loss=0.09791, pruned_loss=0.03608, audio_tagging_loss=0.009687, over 14503.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1263, pruned_loss=0.0407, audio_tagging_loss=0.0119, over 3038808.94 frames. ], batch size: 58, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:40:03,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=209633.33333333334, ans=0.2 2023-11-18 11:40:03,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=209633.33333333334, ans=0.07 2023-11-18 11:40:17,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=209700.0, ans=0.125 2023-11-18 11:40:56,855 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7450, loss[loss=0.1084, simple_loss=0.1152, pruned_loss=0.03882, audio_tagging_loss=0.01194, over 14511.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1264, pruned_loss=0.04094, audio_tagging_loss=0.01185, over 3046305.86 frames. ], batch size: 55, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:41:07,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2023-11-18 11:41:09,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=210033.33333333334, ans=0.125 2023-11-18 11:41:24,897 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.797e+01 9.734e+01 1.062e+02 1.217e+02 1.649e+02, threshold=2.124e+02, percent-clipped=0.0 2023-11-18 11:41:42,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=210233.33333333334, ans=0.2 2023-11-18 11:41:50,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=210233.33333333334, ans=0.09899494936611666 2023-11-18 11:41:52,347 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7500, loss[loss=0.1378, simple_loss=0.1529, pruned_loss=0.04809, audio_tagging_loss=0.0133, over 15534.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1266, pruned_loss=0.04087, audio_tagging_loss=0.0119, over 3047631.35 frames. ], batch size: 56, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:42:08,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=210366.66666666666, ans=0.125 2023-11-18 11:42:17,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=210433.33333333334, ans=0.0 2023-11-18 11:42:17,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=210433.33333333334, ans=0.125 2023-11-18 11:42:26,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=210500.0, ans=0.05 2023-11-18 11:42:34,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.18 vs. limit=22.5 2023-11-18 11:42:48,181 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7550, loss[loss=0.08705, simple_loss=0.09259, pruned_loss=0.03053, audio_tagging_loss=0.01022, over 15414.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1265, pruned_loss=0.04087, audio_tagging_loss=0.01176, over 3043295.53 frames. ], batch size: 61, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:42:50,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=210633.33333333334, ans=0.0 2023-11-18 11:42:55,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=210633.33333333334, ans=0.125 2023-11-18 11:43:05,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.68 vs. limit=22.5 2023-11-18 11:43:10,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=210766.66666666666, ans=0.125 2023-11-18 11:43:15,905 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 1.017e+02 1.103e+02 1.286e+02 2.062e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 11:43:43,331 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7600, loss[loss=0.09935, simple_loss=0.1072, pruned_loss=0.03207, audio_tagging_loss=0.01371, over 15189.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1259, pruned_loss=0.04057, audio_tagging_loss=0.01179, over 3039648.68 frames. ], batch size: 57, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:44:05,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=211100.0, ans=0.125 2023-11-18 11:44:05,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=211100.0, ans=0.125 2023-11-18 11:44:10,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=211100.0, ans=0.2 2023-11-18 11:44:11,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=211100.0, ans=0.125 2023-11-18 11:44:39,627 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7650, loss[loss=0.08009, simple_loss=0.07752, pruned_loss=0.02482, audio_tagging_loss=0.0165, over 16327.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1261, pruned_loss=0.04059, audio_tagging_loss=0.01186, over 3046416.45 frames. ], batch size: 64, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:45:04,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=211433.33333333334, ans=0.125 2023-11-18 11:45:07,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 9.868e+01 1.071e+02 1.213e+02 1.962e+02, threshold=2.142e+02, percent-clipped=0.0 2023-11-18 11:45:21,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-18 11:45:22,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=211500.0, ans=0.125 2023-11-18 11:45:35,751 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7700, loss[loss=0.1235, simple_loss=0.1378, pruned_loss=0.04188, audio_tagging_loss=0.01275, over 15644.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.1253, pruned_loss=0.04014, audio_tagging_loss=0.01188, over 3047102.36 frames. ], batch size: 56, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:45:42,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=211633.33333333334, ans=0.04949747468305833 2023-11-18 11:45:45,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.78 vs. limit=10.0 2023-11-18 11:45:59,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=211766.66666666666, ans=0.0 2023-11-18 11:46:30,611 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7750, loss[loss=0.08409, simple_loss=0.0867, pruned_loss=0.02435, audio_tagging_loss=0.0164, over 14814.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1256, pruned_loss=0.04015, audio_tagging_loss=0.01206, over 3048638.38 frames. ], batch size: 55, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:46:51,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=212033.33333333334, ans=0.0 2023-11-18 11:46:59,434 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 9.456e+01 1.068e+02 1.204e+02 1.685e+02, threshold=2.136e+02, percent-clipped=0.0 2023-11-18 11:46:59,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=212100.0, ans=0.125 2023-11-18 11:47:02,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.87 vs. limit=22.5 2023-11-18 11:47:10,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=212166.66666666666, ans=0.0 2023-11-18 11:47:12,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=212166.66666666666, ans=0.1 2023-11-18 11:47:12,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2023-11-18 11:47:22,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2023-11-18 11:47:26,654 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7800, loss[loss=0.1024, simple_loss=0.1095, pruned_loss=0.0337, audio_tagging_loss=0.01396, over 14372.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1259, pruned_loss=0.0402, audio_tagging_loss=0.01224, over 3045414.96 frames. ], batch size: 56, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:47:30,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=212300.0, ans=0.125 2023-11-18 11:47:53,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=212433.33333333334, ans=0.125 2023-11-18 11:48:02,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=212500.0, ans=0.07 2023-11-18 11:48:22,972 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7850, loss[loss=0.09755, simple_loss=0.1065, pruned_loss=0.03107, audio_tagging_loss=0.01325, over 14923.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1269, pruned_loss=0.04082, audio_tagging_loss=0.01224, over 3055670.38 frames. ], batch size: 57, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:48:32,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=212700.0, ans=0.125 2023-11-18 11:48:35,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=212700.0, ans=0.125 2023-11-18 11:48:42,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=212766.66666666666, ans=0.2 2023-11-18 11:48:49,654 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 1.011e+02 1.139e+02 1.309e+02 2.076e+02, threshold=2.278e+02, percent-clipped=0.0 2023-11-18 11:48:54,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=212833.33333333334, ans=0.125 2023-11-18 11:48:57,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=212833.33333333334, ans=0.1 2023-11-18 11:48:57,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=212833.33333333334, ans=0.2 2023-11-18 11:49:10,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=212900.0, ans=0.125 2023-11-18 11:49:10,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=212900.0, ans=0.1 2023-11-18 11:49:13,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=212900.0, ans=0.0 2023-11-18 11:49:17,814 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7900, loss[loss=0.1411, simple_loss=0.1554, pruned_loss=0.05223, audio_tagging_loss=0.01122, over 15451.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1262, pruned_loss=0.04056, audio_tagging_loss=0.01236, over 3054266.02 frames. ], batch size: 58, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:49:33,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=213033.33333333334, ans=0.0 2023-11-18 11:49:39,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-18 11:49:47,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=213100.0, ans=0.0 2023-11-18 11:50:11,469 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-11-18 11:50:11,966 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 7950, loss[loss=0.1342, simple_loss=0.1475, pruned_loss=0.04967, audio_tagging_loss=0.01078, over 15297.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1269, pruned_loss=0.04035, audio_tagging_loss=0.01245, over 3054417.54 frames. ], batch size: 55, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:50:20,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=213300.0, ans=0.2 2023-11-18 11:50:28,563 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:50:37,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=213433.33333333334, ans=0.2 2023-11-18 11:50:42,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 9.499e+01 1.075e+02 1.220e+02 1.746e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 11:51:01,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=213566.66666666666, ans=0.0 2023-11-18 11:51:06,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=213566.66666666666, ans=0.125 2023-11-18 11:51:11,124 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8000, loss[loss=0.1358, simple_loss=0.1317, pruned_loss=0.05384, audio_tagging_loss=0.01609, over 16699.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1262, pruned_loss=0.04013, audio_tagging_loss=0.01243, over 3051588.21 frames. ], batch size: 64, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:51:14,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=213633.33333333334, ans=0.2 2023-11-18 11:51:34,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=213766.66666666666, ans=0.125 2023-11-18 11:51:41,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-18 11:51:51,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=213833.33333333334, ans=0.2 2023-11-18 11:52:02,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.42 vs. limit=12.0 2023-11-18 11:52:05,882 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8050, loss[loss=0.128, simple_loss=0.1402, pruned_loss=0.04679, audio_tagging_loss=0.01111, over 13959.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1257, pruned_loss=0.04031, audio_tagging_loss=0.01259, over 3047226.40 frames. ], batch size: 55, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:52:06,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=213966.66666666666, ans=0.125 2023-11-18 11:52:17,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=214033.33333333334, ans=0.2 2023-11-18 11:52:23,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=214033.33333333334, ans=0.0 2023-11-18 11:52:24,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=214033.33333333334, ans=0.1 2023-11-18 11:52:28,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=214100.0, ans=0.2 2023-11-18 11:52:33,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 9.594e+01 1.075e+02 1.227e+02 1.823e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 11:52:35,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=214100.0, ans=0.2 2023-11-18 11:52:39,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2023-11-18 11:53:00,852 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8100, loss[loss=0.123, simple_loss=0.1333, pruned_loss=0.04441, audio_tagging_loss=0.01193, over 14222.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1261, pruned_loss=0.04044, audio_tagging_loss=0.01244, over 3053374.84 frames. ], batch size: 53, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:53:07,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214300.0, ans=0.1 2023-11-18 11:53:23,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=214433.33333333334, ans=0.0 2023-11-18 11:53:40,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=214500.0, ans=0.2 2023-11-18 11:53:41,400 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:53:53,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2023-11-18 11:53:56,999 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8150, loss[loss=0.06874, simple_loss=0.06856, pruned_loss=0.02022, audio_tagging_loss=0.01423, over 15595.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1263, pruned_loss=0.04059, audio_tagging_loss=0.01224, over 3061654.51 frames. ], batch size: 62, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:54:04,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=214633.33333333334, ans=0.0 2023-11-18 11:54:22,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=214766.66666666666, ans=0.125 2023-11-18 11:54:24,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 9.659e+01 1.081e+02 1.221e+02 1.815e+02, threshold=2.163e+02, percent-clipped=0.0 2023-11-18 11:54:34,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=214833.33333333334, ans=0.2 2023-11-18 11:54:42,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=214900.0, ans=0.0 2023-11-18 11:54:48,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=214900.0, ans=0.5 2023-11-18 11:54:51,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=214900.0, ans=0.125 2023-11-18 11:54:53,066 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8200, loss[loss=0.1286, simple_loss=0.1381, pruned_loss=0.04676, audio_tagging_loss=0.01276, over 13976.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.127, pruned_loss=0.04064, audio_tagging_loss=0.012, over 3055288.96 frames. ], batch size: 52, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:54:53,096 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:54:54,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=214966.66666666666, ans=0.1 2023-11-18 11:54:59,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214966.66666666666, ans=0.1 2023-11-18 11:55:03,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=215033.33333333334, ans=0.125 2023-11-18 11:55:32,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=215166.66666666666, ans=0.125 2023-11-18 11:55:35,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.76 vs. limit=22.5 2023-11-18 11:55:39,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215233.33333333334, ans=0.1 2023-11-18 11:55:42,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=215233.33333333334, ans=0.125 2023-11-18 11:55:43,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=215233.33333333334, ans=0.2 2023-11-18 11:55:48,002 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8250, loss[loss=0.1257, simple_loss=0.131, pruned_loss=0.04865, audio_tagging_loss=0.01157, over 14407.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1267, pruned_loss=0.0405, audio_tagging_loss=0.01188, over 3051266.51 frames. ], batch size: 55, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:56:14,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=215433.33333333334, ans=0.0 2023-11-18 11:56:16,466 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.340e+01 1.070e+02 1.193e+02 1.705e+02, threshold=2.140e+02, percent-clipped=0.0 2023-11-18 11:56:18,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=215433.33333333334, ans=0.0 2023-11-18 11:56:25,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=215500.0, ans=0.2 2023-11-18 11:56:35,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2023-11-18 11:56:36,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=215566.66666666666, ans=0.125 2023-11-18 11:56:43,853 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8300, loss[loss=0.1182, simple_loss=0.1238, pruned_loss=0.04116, audio_tagging_loss=0.01514, over 16101.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.1256, pruned_loss=0.04013, audio_tagging_loss=0.01189, over 3051130.70 frames. ], batch size: 60, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:56:44,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215633.33333333334, ans=0.1 2023-11-18 11:56:55,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=215700.0, ans=0.0 2023-11-18 11:56:59,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215700.0, ans=0.1 2023-11-18 11:57:05,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=215766.66666666666, ans=12.0 2023-11-18 11:57:15,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=215833.33333333334, ans=0.1 2023-11-18 11:57:26,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=215833.33333333334, ans=0.025 2023-11-18 11:57:39,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.46 vs. limit=22.5 2023-11-18 11:57:39,801 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8350, loss[loss=0.1288, simple_loss=0.1478, pruned_loss=0.04494, audio_tagging_loss=0.009942, over 15105.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1268, pruned_loss=0.04062, audio_tagging_loss=0.01182, over 3054277.02 frames. ], batch size: 57, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:57:42,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=215966.66666666666, ans=0.04949747468305833 2023-11-18 11:57:48,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=215966.66666666666, ans=0.125 2023-11-18 11:57:59,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=216033.33333333334, ans=0.125 2023-11-18 11:58:04,766 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.91 vs. limit=22.5 2023-11-18 11:58:07,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.952e+01 1.113e+02 1.251e+02 3.254e+02, threshold=2.227e+02, percent-clipped=1.0 2023-11-18 11:58:08,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=216100.0, ans=0.0 2023-11-18 11:58:26,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=216233.33333333334, ans=0.125 2023-11-18 11:58:35,200 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8400, loss[loss=0.1495, simple_loss=0.1711, pruned_loss=0.05509, audio_tagging_loss=0.008909, over 14608.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1284, pruned_loss=0.04106, audio_tagging_loss=0.01168, over 3044786.62 frames. ], batch size: 53, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:58:51,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=12.0 2023-11-18 11:58:52,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=216366.66666666666, ans=0.125 2023-11-18 11:58:53,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=216366.66666666666, ans=0.05 2023-11-18 11:58:54,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=216366.66666666666, ans=0.125 2023-11-18 11:58:58,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=216433.33333333334, ans=0.07 2023-11-18 11:59:22,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=216566.66666666666, ans=0.125 2023-11-18 11:59:30,808 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8450, loss[loss=0.1582, simple_loss=0.186, pruned_loss=0.05527, audio_tagging_loss=0.009951, over 15237.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1268, pruned_loss=0.04067, audio_tagging_loss=0.01184, over 3049900.06 frames. ], batch size: 54, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:59:46,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=216700.0, ans=0.125 2023-11-18 11:59:47,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=216700.0, ans=0.125 2023-11-18 11:59:58,324 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 9.450e+01 1.064e+02 1.181e+02 2.171e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 12:00:06,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=216833.33333333334, ans=0.125 2023-11-18 12:00:19,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=216900.0, ans=0.125 2023-11-18 12:00:19,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=216900.0, ans=0.125 2023-11-18 12:00:22,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=216900.0, ans=0.0 2023-11-18 12:00:26,218 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8500, loss[loss=0.1186, simple_loss=0.1297, pruned_loss=0.04082, audio_tagging_loss=0.01289, over 16341.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1262, pruned_loss=0.04021, audio_tagging_loss=0.01187, over 3045911.91 frames. ], batch size: 62, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 12:00:44,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=217033.33333333334, ans=0.125 2023-11-18 12:00:51,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=217100.0, ans=0.125 2023-11-18 12:00:51,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=217100.0, ans=0.2 2023-11-18 12:01:07,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-11-18 12:01:10,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217233.33333333334, ans=0.1 2023-11-18 12:01:21,546 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8550, loss[loss=0.07784, simple_loss=0.08109, pruned_loss=0.02413, audio_tagging_loss=0.01316, over 14640.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1266, pruned_loss=0.0405, audio_tagging_loss=0.01205, over 3043985.89 frames. ], batch size: 55, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 12:01:27,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=217300.0, ans=10.0 2023-11-18 12:01:42,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=217366.66666666666, ans=0.125 2023-11-18 12:01:49,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 1.000e+02 1.079e+02 1.274e+02 1.597e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 12:01:54,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=217500.0, ans=0.0 2023-11-18 12:02:16,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2023-11-18 12:02:17,150 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8600, loss[loss=0.08004, simple_loss=0.07162, pruned_loss=0.02989, audio_tagging_loss=0.01434, over 13306.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.126, pruned_loss=0.04025, audio_tagging_loss=0.01208, over 3037104.83 frames. ], batch size: 54, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:02:23,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=217633.33333333334, ans=0.09899494936611666 2023-11-18 12:02:32,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2023-11-18 12:02:50,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=217833.33333333334, ans=0.0 2023-11-18 12:02:53,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2023-11-18 12:02:53,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=217833.33333333334, ans=0.0 2023-11-18 12:02:57,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217833.33333333334, ans=0.1 2023-11-18 12:02:57,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=217833.33333333334, ans=0.5 2023-11-18 12:02:58,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=217833.33333333334, ans=0.125 2023-11-18 12:03:13,322 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8650, loss[loss=0.07999, simple_loss=0.0855, pruned_loss=0.02295, audio_tagging_loss=0.01429, over 16360.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1264, pruned_loss=0.04033, audio_tagging_loss=0.01207, over 3039420.71 frames. ], batch size: 62, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:03:26,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=218033.33333333334, ans=0.0 2023-11-18 12:03:38,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=218100.0, ans=0.0 2023-11-18 12:03:38,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=218100.0, ans=0.125 2023-11-18 12:03:39,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=218100.0, ans=0.0 2023-11-18 12:03:40,758 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.473e+01 1.061e+02 1.180e+02 2.111e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 12:03:42,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=218100.0, ans=0.0 2023-11-18 12:03:58,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=218233.33333333334, ans=15.0 2023-11-18 12:03:59,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=218233.33333333334, ans=0.5 2023-11-18 12:04:04,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=218233.33333333334, ans=0.0 2023-11-18 12:04:08,811 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8700, loss[loss=0.1442, simple_loss=0.1616, pruned_loss=0.05067, audio_tagging_loss=0.01276, over 15882.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.1255, pruned_loss=0.03965, audio_tagging_loss=0.01226, over 3043669.00 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:04:24,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=218366.66666666666, ans=0.09899494936611666 2023-11-18 12:04:28,571 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.46 vs. limit=5.0 2023-11-18 12:04:41,796 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2023-11-18 12:04:55,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=218566.66666666666, ans=0.125 2023-11-18 12:05:04,350 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8750, loss[loss=0.1093, simple_loss=0.121, pruned_loss=0.03596, audio_tagging_loss=0.01283, over 16220.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1269, pruned_loss=0.04025, audio_tagging_loss=0.01219, over 3044154.34 frames. ], batch size: 61, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:05:04,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=218633.33333333334, ans=0.09899494936611666 2023-11-18 12:05:29,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=218766.66666666666, ans=0.125 2023-11-18 12:05:31,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.74 vs. limit=10.0 2023-11-18 12:05:32,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.349e+01 1.069e+02 1.188e+02 1.662e+02, threshold=2.138e+02, percent-clipped=0.0 2023-11-18 12:05:57,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=218900.0, ans=0.0 2023-11-18 12:06:00,717 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8800, loss[loss=0.1352, simple_loss=0.155, pruned_loss=0.04677, audio_tagging_loss=0.01097, over 14237.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1265, pruned_loss=0.04013, audio_tagging_loss=0.0122, over 3043078.74 frames. ], batch size: 53, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:06:34,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=219166.66666666666, ans=0.125 2023-11-18 12:06:41,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=219166.66666666666, ans=0.125 2023-11-18 12:06:55,584 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8850, loss[loss=0.1068, simple_loss=0.1215, pruned_loss=0.03468, audio_tagging_loss=0.0114, over 14578.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1268, pruned_loss=0.04014, audio_tagging_loss=0.01215, over 3042034.29 frames. ], batch size: 53, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:06:56,905 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.099e+00 2023-11-18 12:07:05,665 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:07:14,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-18 12:07:17,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=219433.33333333334, ans=0.0 2023-11-18 12:07:24,031 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 9.378e+01 1.055e+02 1.190e+02 1.653e+02, threshold=2.110e+02, percent-clipped=0.0 2023-11-18 12:07:32,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=219500.0, ans=0.0 2023-11-18 12:07:50,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=219633.33333333334, ans=0.125 2023-11-18 12:07:50,924 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8900, loss[loss=0.08332, simple_loss=0.08565, pruned_loss=0.02785, audio_tagging_loss=0.01265, over 14562.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1274, pruned_loss=0.04052, audio_tagging_loss=0.01205, over 3041412.90 frames. ], batch size: 55, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:08:00,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=219633.33333333334, ans=0.0 2023-11-18 12:08:24,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=219833.33333333334, ans=0.0 2023-11-18 12:08:26,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.95 vs. limit=10.0 2023-11-18 12:08:47,591 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 8950, loss[loss=0.1266, simple_loss=0.1376, pruned_loss=0.0453, audio_tagging_loss=0.01245, over 16403.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1276, pruned_loss=0.04057, audio_tagging_loss=0.01187, over 3047646.00 frames. ], batch size: 61, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:08:48,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2023-11-18 12:08:55,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=12.0 2023-11-18 12:08:56,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=219966.66666666666, ans=0.09899494936611666 2023-11-18 12:09:06,344 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-11-18 12:09:06,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=220033.33333333334, ans=0.125 2023-11-18 12:09:12,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=220100.0, ans=0.125 2023-11-18 12:09:14,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 9.669e+01 1.057e+02 1.154e+02 1.635e+02, threshold=2.114e+02, percent-clipped=0.0 2023-11-18 12:09:15,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=220100.0, ans=0.125 2023-11-18 12:09:17,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=220100.0, ans=0.125 2023-11-18 12:09:18,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=220100.0, ans=0.2 2023-11-18 12:09:23,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220166.66666666666, ans=0.1 2023-11-18 12:09:33,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-11-18 12:09:36,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.36 vs. limit=22.5 2023-11-18 12:09:41,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=220300.0, ans=0.125 2023-11-18 12:09:41,980 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9000, loss[loss=0.09132, simple_loss=0.09803, pruned_loss=0.02849, audio_tagging_loss=0.01382, over 13705.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1268, pruned_loss=0.04047, audio_tagging_loss=0.01176, over 3048661.07 frames. ], batch size: 52, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:09:41,981 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 12:09:57,166 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4193, 2.5009, 3.6592, 2.6489], device='cuda:3') 2023-11-18 12:10:14,809 INFO [train_asr.py:1147] (3/4) Epoch 3, validation: loss=0.07901, simple_loss=0.06429, pruned_loss=0.01152, audio_tagging_loss=0.03534, over 4681554.00 frames. 2023-11-18 12:10:14,809 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 12:10:21,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=220300.0, ans=0.0 2023-11-18 12:10:22,851 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:10:23,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=220300.0, ans=0.125 2023-11-18 12:10:56,503 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.853e-01 2023-11-18 12:10:56,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=220500.0, ans=0.125 2023-11-18 12:11:02,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=220566.66666666666, ans=0.125 2023-11-18 12:11:09,377 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9050, loss[loss=0.07354, simple_loss=0.07535, pruned_loss=0.02126, audio_tagging_loss=0.0146, over 14541.00 frames. ], tot_loss[loss=0.1151, simple_loss=0.1262, pruned_loss=0.04015, audio_tagging_loss=0.01185, over 3047743.92 frames. ], batch size: 58, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:11:15,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220633.33333333334, ans=0.1 2023-11-18 12:11:18,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=220633.33333333334, ans=0.05 2023-11-18 12:11:21,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=220700.0, ans=0.125 2023-11-18 12:11:24,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220700.0, ans=0.1 2023-11-18 12:11:36,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.522e+01 1.061e+02 1.198e+02 2.427e+02, threshold=2.123e+02, percent-clipped=1.0 2023-11-18 12:12:04,411 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9100, loss[loss=0.1166, simple_loss=0.1303, pruned_loss=0.0404, audio_tagging_loss=0.01103, over 16148.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1268, pruned_loss=0.04027, audio_tagging_loss=0.01176, over 3050854.89 frames. ], batch size: 59, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:12:13,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=220966.66666666666, ans=0.125 2023-11-18 12:12:24,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-18 12:12:28,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=221100.0, ans=0.125 2023-11-18 12:12:33,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2023-11-18 12:12:37,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=221166.66666666666, ans=0.0 2023-11-18 12:12:46,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=221166.66666666666, ans=0.05 2023-11-18 12:12:56,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.30 vs. limit=6.0 2023-11-18 12:12:59,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2023-11-18 12:13:00,117 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9150, loss[loss=0.1134, simple_loss=0.1326, pruned_loss=0.03673, audio_tagging_loss=0.01035, over 15569.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1264, pruned_loss=0.04007, audio_tagging_loss=0.01176, over 3051607.45 frames. ], batch size: 56, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:13:05,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=221300.0, ans=0.2 2023-11-18 12:13:06,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=221300.0, ans=0.125 2023-11-18 12:13:09,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-11-18 12:13:12,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=221366.66666666666, ans=0.125 2023-11-18 12:13:26,495 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:13:28,252 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.044e+01 9.509e+01 1.025e+02 1.123e+02 1.698e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 12:13:33,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=221500.0, ans=0.125 2023-11-18 12:13:57,067 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9200, loss[loss=0.09273, simple_loss=0.1032, pruned_loss=0.03223, audio_tagging_loss=0.008919, over 14745.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.126, pruned_loss=0.03999, audio_tagging_loss=0.01178, over 3053206.43 frames. ], batch size: 58, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:13:57,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=15.0 2023-11-18 12:14:00,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=221633.33333333334, ans=0.125 2023-11-18 12:14:04,598 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:14:09,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2023-11-18 12:14:28,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=221766.66666666666, ans=0.0 2023-11-18 12:14:51,676 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9250, loss[loss=0.1049, simple_loss=0.112, pruned_loss=0.0383, audio_tagging_loss=0.01056, over 16025.00 frames. ], tot_loss[loss=0.1138, simple_loss=0.1246, pruned_loss=0.03965, audio_tagging_loss=0.01185, over 3054211.22 frames. ], batch size: 60, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:15:20,392 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 9.718e+01 1.095e+02 1.245e+02 2.428e+02, threshold=2.190e+02, percent-clipped=1.0 2023-11-18 12:15:22,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=222100.0, ans=0.125 2023-11-18 12:15:24,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=222166.66666666666, ans=0.0 2023-11-18 12:15:34,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=222166.66666666666, ans=15.0 2023-11-18 12:15:41,933 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.336e-02 2023-11-18 12:15:46,834 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9300, loss[loss=0.09984, simple_loss=0.1085, pruned_loss=0.02985, audio_tagging_loss=0.01573, over 15597.00 frames. ], tot_loss[loss=0.1134, simple_loss=0.1242, pruned_loss=0.03943, audio_tagging_loss=0.01189, over 3049985.57 frames. ], batch size: 60, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:15:47,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=222300.0, ans=0.125 2023-11-18 12:15:54,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=222300.0, ans=0.1 2023-11-18 12:16:18,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=222433.33333333334, ans=0.125 2023-11-18 12:16:23,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=222500.0, ans=0.125 2023-11-18 12:16:24,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=222500.0, ans=0.05 2023-11-18 12:16:37,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=222566.66666666666, ans=0.5 2023-11-18 12:16:43,425 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9350, loss[loss=0.1023, simple_loss=0.1085, pruned_loss=0.03563, audio_tagging_loss=0.01243, over 14254.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1252, pruned_loss=0.03986, audio_tagging_loss=0.01189, over 3051230.43 frames. ], batch size: 54, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:16:56,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-11-18 12:16:57,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2023-11-18 12:17:00,291 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:17:01,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=222700.0, ans=0.07 2023-11-18 12:17:06,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=222766.66666666666, ans=22.5 2023-11-18 12:17:10,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.296e+01 9.866e+01 1.128e+02 1.276e+02 1.788e+02, threshold=2.257e+02, percent-clipped=0.0 2023-11-18 12:17:32,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.12 vs. limit=22.5 2023-11-18 12:17:39,409 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9400, loss[loss=0.09224, simple_loss=0.09093, pruned_loss=0.03022, audio_tagging_loss=0.01655, over 14754.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1261, pruned_loss=0.04022, audio_tagging_loss=0.012, over 3049662.05 frames. ], batch size: 58, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:17:43,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=222966.66666666666, ans=0.125 2023-11-18 12:17:44,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=222966.66666666666, ans=0.0 2023-11-18 12:17:57,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=22.5 2023-11-18 12:18:02,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-11-18 12:18:16,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=223166.66666666666, ans=0.125 2023-11-18 12:18:24,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2023-11-18 12:18:32,274 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:18:34,399 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9450, loss[loss=0.1339, simple_loss=0.1541, pruned_loss=0.04579, audio_tagging_loss=0.01107, over 15354.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1267, pruned_loss=0.04011, audio_tagging_loss=0.01203, over 3054594.21 frames. ], batch size: 56, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:18:41,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=223300.0, ans=0.125 2023-11-18 12:18:45,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=223366.66666666666, ans=0.0 2023-11-18 12:18:51,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=223366.66666666666, ans=0.2 2023-11-18 12:19:03,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.898e+01 1.046e+02 1.175e+02 1.737e+02, threshold=2.092e+02, percent-clipped=0.0 2023-11-18 12:19:26,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=12.0 2023-11-18 12:19:31,342 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9500, loss[loss=0.1341, simple_loss=0.1549, pruned_loss=0.04339, audio_tagging_loss=0.01328, over 14762.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1271, pruned_loss=0.04013, audio_tagging_loss=0.01213, over 3053727.32 frames. ], batch size: 54, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:19:32,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=223633.33333333334, ans=0.1 2023-11-18 12:20:19,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=223900.0, ans=0.0 2023-11-18 12:20:27,293 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9550, loss[loss=0.113, simple_loss=0.1185, pruned_loss=0.04092, audio_tagging_loss=0.01281, over 15729.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1273, pruned_loss=0.04021, audio_tagging_loss=0.01215, over 3050487.31 frames. ], batch size: 58, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:20:29,968 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-11-18 12:20:34,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=223966.66666666666, ans=0.125 2023-11-18 12:20:50,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2023-11-18 12:20:55,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 1.006e+02 1.117e+02 1.248e+02 1.898e+02, threshold=2.233e+02, percent-clipped=0.0 2023-11-18 12:21:06,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=224166.66666666666, ans=0.2 2023-11-18 12:21:22,534 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9600, loss[loss=0.1242, simple_loss=0.1455, pruned_loss=0.0398, audio_tagging_loss=0.01168, over 14620.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1283, pruned_loss=0.04064, audio_tagging_loss=0.01217, over 3051364.24 frames. ], batch size: 55, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:21:22,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=224300.0, ans=0.2 2023-11-18 12:21:27,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=224300.0, ans=0.125 2023-11-18 12:21:38,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=224366.66666666666, ans=0.0 2023-11-18 12:22:10,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=224566.66666666666, ans=0.125 2023-11-18 12:22:18,509 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9650, loss[loss=0.1058, simple_loss=0.1137, pruned_loss=0.03586, audio_tagging_loss=0.01313, over 15714.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1271, pruned_loss=0.04019, audio_tagging_loss=0.01216, over 3044391.04 frames. ], batch size: 57, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:22:24,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2023-11-18 12:22:45,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=224766.66666666666, ans=0.2 2023-11-18 12:22:45,985 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.215e+01 1.048e+02 1.170e+02 1.955e+02, threshold=2.095e+02, percent-clipped=0.0 2023-11-18 12:22:53,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=224833.33333333334, ans=0.2 2023-11-18 12:22:57,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=224833.33333333334, ans=0.0 2023-11-18 12:23:06,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=224900.0, ans=0.125 2023-11-18 12:23:08,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-11-18 12:23:14,138 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9700, loss[loss=0.1097, simple_loss=0.1219, pruned_loss=0.03825, audio_tagging_loss=0.01047, over 15104.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1282, pruned_loss=0.04083, audio_tagging_loss=0.012, over 3041963.01 frames. ], batch size: 55, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:23:15,673 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-18 12:23:16,563 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.546e-03 2023-11-18 12:23:17,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=12.0 2023-11-18 12:23:24,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=225033.33333333334, ans=0.0 2023-11-18 12:23:32,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225033.33333333334, ans=0.0 2023-11-18 12:23:41,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.68 vs. limit=22.5 2023-11-18 12:24:09,678 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9750, loss[loss=0.1044, simple_loss=0.1145, pruned_loss=0.03746, audio_tagging_loss=0.009646, over 15575.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1279, pruned_loss=0.04067, audio_tagging_loss=0.01182, over 3039053.73 frames. ], batch size: 61, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:24:14,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225300.0, ans=0.1 2023-11-18 12:24:21,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=225366.66666666666, ans=0.125 2023-11-18 12:24:30,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=225366.66666666666, ans=0.025 2023-11-18 12:24:34,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=225433.33333333334, ans=0.0 2023-11-18 12:24:35,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=15.0 2023-11-18 12:24:38,100 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.876e+01 1.096e+02 1.307e+02 1.863e+02, threshold=2.192e+02, percent-clipped=0.0 2023-11-18 12:24:42,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=225500.0, ans=0.0 2023-11-18 12:25:06,226 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9800, loss[loss=0.06852, simple_loss=0.07201, pruned_loss=0.01803, audio_tagging_loss=0.01448, over 16571.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1272, pruned_loss=0.04028, audio_tagging_loss=0.01178, over 3039186.18 frames. ], batch size: 65, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:25:13,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=225633.33333333334, ans=0.04949747468305833 2023-11-18 12:25:43,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.21 vs. limit=12.0 2023-11-18 12:25:55,525 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:26:01,873 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9850, loss[loss=0.1261, simple_loss=0.1542, pruned_loss=0.03913, audio_tagging_loss=0.009905, over 15523.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1278, pruned_loss=0.04067, audio_tagging_loss=0.01178, over 3038478.56 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:26:03,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.99 vs. limit=22.5 2023-11-18 12:26:18,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=226033.33333333334, ans=0.1 2023-11-18 12:26:20,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:24,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=226100.0, ans=0.125 2023-11-18 12:26:26,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226100.0, ans=0.1 2023-11-18 12:26:26,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=226100.0, ans=0.0 2023-11-18 12:26:30,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.358e+01 1.054e+02 1.143e+02 1.553e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 12:26:44,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=226166.66666666666, ans=0.0 2023-11-18 12:26:48,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.01 vs. limit=22.5 2023-11-18 12:26:57,537 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9900, loss[loss=0.1474, simple_loss=0.1742, pruned_loss=0.0534, audio_tagging_loss=0.006894, over 15039.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1277, pruned_loss=0.04029, audio_tagging_loss=0.01182, over 3034245.55 frames. ], batch size: 54, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:27:04,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=226300.0, ans=0.125 2023-11-18 12:27:12,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.70 vs. limit=22.5 2023-11-18 12:27:13,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=226366.66666666666, ans=0.125 2023-11-18 12:27:31,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2023-11-18 12:27:43,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226566.66666666666, ans=0.1 2023-11-18 12:27:53,606 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 9950, loss[loss=0.08253, simple_loss=0.09338, pruned_loss=0.02237, audio_tagging_loss=0.01347, over 15560.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1288, pruned_loss=0.04056, audio_tagging_loss=0.01171, over 3038227.39 frames. ], batch size: 58, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:27:55,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2023-11-18 12:28:07,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=226700.0, ans=0.125 2023-11-18 12:28:13,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=226700.0, ans=0.2 2023-11-18 12:28:18,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=226766.66666666666, ans=0.0 2023-11-18 12:28:20,834 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.751e+01 9.736e+01 1.077e+02 1.172e+02 1.969e+02, threshold=2.153e+02, percent-clipped=0.0 2023-11-18 12:28:32,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=15.0 2023-11-18 12:28:35,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=226833.33333333334, ans=0.125 2023-11-18 12:28:35,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-11-18 12:28:36,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=226833.33333333334, ans=0.0 2023-11-18 12:28:49,503 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10000, loss[loss=0.1258, simple_loss=0.1298, pruned_loss=0.04412, audio_tagging_loss=0.01683, over 15394.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1278, pruned_loss=0.04024, audio_tagging_loss=0.01176, over 3036844.02 frames. ], batch size: 57, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:28:50,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=226966.66666666666, ans=0.125 2023-11-18 12:28:50,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-11-18 12:29:14,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=227100.0, ans=0.95 2023-11-18 12:29:18,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=227100.0, ans=0.125 2023-11-18 12:29:37,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=227233.33333333334, ans=0.125 2023-11-18 12:29:44,580 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10050, loss[loss=0.1093, simple_loss=0.1255, pruned_loss=0.03456, audio_tagging_loss=0.01205, over 16388.00 frames. ], tot_loss[loss=0.1151, simple_loss=0.1268, pruned_loss=0.03972, audio_tagging_loss=0.01196, over 3041572.84 frames. ], batch size: 63, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:29:46,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=227300.0, ans=0.125 2023-11-18 12:29:53,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=227300.0, ans=0.0 2023-11-18 12:29:56,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=227366.66666666666, ans=0.125 2023-11-18 12:29:57,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=8.0 2023-11-18 12:30:01,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=227366.66666666666, ans=0.07 2023-11-18 12:30:04,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=227366.66666666666, ans=0.95 2023-11-18 12:30:12,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.14 vs. limit=10.0 2023-11-18 12:30:13,277 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.408e+01 1.046e+02 1.122e+02 1.934e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 12:30:41,429 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10100, loss[loss=0.1288, simple_loss=0.1424, pruned_loss=0.04652, audio_tagging_loss=0.01112, over 15193.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1265, pruned_loss=0.03973, audio_tagging_loss=0.01197, over 3043444.74 frames. ], batch size: 55, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:30:42,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-11-18 12:31:14,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=227833.33333333334, ans=0.0 2023-11-18 12:31:18,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=227833.33333333334, ans=0.125 2023-11-18 12:31:21,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=227833.33333333334, ans=0.125 2023-11-18 12:31:25,794 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:31:36,834 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10150, loss[loss=0.08841, simple_loss=0.08673, pruned_loss=0.02982, audio_tagging_loss=0.01523, over 14401.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1269, pruned_loss=0.03999, audio_tagging_loss=0.01199, over 3046292.19 frames. ], batch size: 55, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:32:01,852 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:32:04,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 9.918e+01 1.075e+02 1.229e+02 2.012e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 12:32:29,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=228233.33333333334, ans=0.2 2023-11-18 12:32:32,042 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10200, loss[loss=0.1117, simple_loss=0.1283, pruned_loss=0.0366, audio_tagging_loss=0.011, over 14682.00 frames. ], tot_loss[loss=0.1143, simple_loss=0.1254, pruned_loss=0.03952, audio_tagging_loss=0.01211, over 3052339.45 frames. ], batch size: 55, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:32:45,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=228366.66666666666, ans=0.2 2023-11-18 12:32:49,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=228366.66666666666, ans=0.125 2023-11-18 12:32:52,223 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:33:03,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2023-11-18 12:33:27,630 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10250, loss[loss=0.1123, simple_loss=0.1269, pruned_loss=0.03663, audio_tagging_loss=0.01219, over 16154.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.1257, pruned_loss=0.03968, audio_tagging_loss=0.01214, over 3052048.82 frames. ], batch size: 59, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:33:51,113 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:33:55,204 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 9.506e+01 1.026e+02 1.188e+02 1.534e+02, threshold=2.052e+02, percent-clipped=0.0 2023-11-18 12:33:58,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=228766.66666666666, ans=0.125 2023-11-18 12:34:03,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.53 vs. limit=15.0 2023-11-18 12:34:23,494 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10300, loss[loss=0.09783, simple_loss=0.1075, pruned_loss=0.03295, audio_tagging_loss=0.01115, over 14647.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1257, pruned_loss=0.04016, audio_tagging_loss=0.01227, over 3051868.72 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:34:36,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=229033.33333333334, ans=0.0 2023-11-18 12:34:37,585 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.257e+00 2023-11-18 12:34:45,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2023-11-18 12:34:56,105 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.280e+00 2023-11-18 12:35:05,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=229166.66666666666, ans=0.0 2023-11-18 12:35:07,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=229233.33333333334, ans=0.0 2023-11-18 12:35:11,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229233.33333333334, ans=0.1 2023-11-18 12:35:13,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=229233.33333333334, ans=0.2 2023-11-18 12:35:18,605 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10350, loss[loss=0.1123, simple_loss=0.1355, pruned_loss=0.03353, audio_tagging_loss=0.01097, over 15534.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.127, pruned_loss=0.04025, audio_tagging_loss=0.01219, over 3053543.07 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:35:26,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=229300.0, ans=0.0 2023-11-18 12:35:47,380 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 9.749e+01 1.144e+02 1.287e+02 1.806e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 12:36:04,760 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2023-11-18 12:36:10,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=229566.66666666666, ans=10.0 2023-11-18 12:36:14,275 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10400, loss[loss=0.118, simple_loss=0.1347, pruned_loss=0.03874, audio_tagging_loss=0.01186, over 15351.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1268, pruned_loss=0.04008, audio_tagging_loss=0.01237, over 3049859.10 frames. ], batch size: 56, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:36:16,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229633.33333333334, ans=0.1 2023-11-18 12:36:26,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=229700.0, ans=0.0 2023-11-18 12:36:26,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=229700.0, ans=15.0 2023-11-18 12:36:33,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=229700.0, ans=0.125 2023-11-18 12:36:54,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2023-11-18 12:37:10,521 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10450, loss[loss=0.103, simple_loss=0.1163, pruned_loss=0.03431, audio_tagging_loss=0.01056, over 13394.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1256, pruned_loss=0.03941, audio_tagging_loss=0.01231, over 3049588.56 frames. ], batch size: 53, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:37:27,844 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2023-11-18 12:37:32,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=230100.0, ans=0.125 2023-11-18 12:37:37,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 9.254e+01 9.863e+01 1.141e+02 1.786e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 12:38:01,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=15.0 2023-11-18 12:38:05,266 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10500, loss[loss=0.1244, simple_loss=0.1333, pruned_loss=0.04775, audio_tagging_loss=0.01003, over 15859.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1252, pruned_loss=0.03927, audio_tagging_loss=0.01213, over 3043082.95 frames. ], batch size: 59, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:38:08,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=230300.0, ans=0.125 2023-11-18 12:38:14,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=230300.0, ans=0.125 2023-11-18 12:38:47,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=230500.0, ans=0.1 2023-11-18 12:38:51,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2023-11-18 12:39:00,311 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10550, loss[loss=0.1142, simple_loss=0.1235, pruned_loss=0.04041, audio_tagging_loss=0.01199, over 14683.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1251, pruned_loss=0.03934, audio_tagging_loss=0.01204, over 3043604.61 frames. ], batch size: 54, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:39:00,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230633.33333333334, ans=0.1 2023-11-18 12:39:03,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-18 12:39:05,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230633.33333333334, ans=0.1 2023-11-18 12:39:11,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230700.0, ans=0.1 2023-11-18 12:39:18,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=230700.0, ans=0.125 2023-11-18 12:39:20,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.59 vs. limit=6.0 2023-11-18 12:39:29,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.555e+01 1.067e+02 1.229e+02 1.948e+02, threshold=2.135e+02, percent-clipped=0.0 2023-11-18 12:39:56,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=230966.66666666666, ans=0.0 2023-11-18 12:39:56,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230966.66666666666, ans=0.1 2023-11-18 12:39:56,904 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10600, loss[loss=0.09661, simple_loss=0.1109, pruned_loss=0.02893, audio_tagging_loss=0.0122, over 15820.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1251, pruned_loss=0.03941, audio_tagging_loss=0.01189, over 3045604.47 frames. ], batch size: 58, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:40:07,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=231033.33333333334, ans=0.125 2023-11-18 12:40:17,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=231033.33333333334, ans=0.125 2023-11-18 12:40:27,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.28 vs. limit=10.0 2023-11-18 12:40:28,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=231100.0, ans=0.125 2023-11-18 12:40:29,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=231166.66666666666, ans=10.0 2023-11-18 12:40:29,732 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2023-11-18 12:40:52,923 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10650, loss[loss=0.1378, simple_loss=0.1439, pruned_loss=0.05107, audio_tagging_loss=0.01476, over 14548.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1256, pruned_loss=0.03976, audio_tagging_loss=0.01183, over 3054070.58 frames. ], batch size: 53, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:41:16,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-18 12:41:20,433 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 9.562e+01 1.044e+02 1.196e+02 1.427e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 12:41:24,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=22.5 2023-11-18 12:41:29,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=231500.0, ans=10.0 2023-11-18 12:41:32,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=231500.0, ans=0.2 2023-11-18 12:41:33,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2023-11-18 12:41:48,154 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10700, loss[loss=0.108, simple_loss=0.1061, pruned_loss=0.03639, audio_tagging_loss=0.01862, over 16008.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1267, pruned_loss=0.04032, audio_tagging_loss=0.01178, over 3057233.96 frames. ], batch size: 61, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:41:49,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=231633.33333333334, ans=0.95 2023-11-18 12:42:03,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=231700.0, ans=0.025 2023-11-18 12:42:23,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=231833.33333333334, ans=0.5 2023-11-18 12:42:42,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2023-11-18 12:42:44,287 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10750, loss[loss=0.09423, simple_loss=0.09146, pruned_loss=0.03165, audio_tagging_loss=0.01686, over 17899.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1263, pruned_loss=0.03993, audio_tagging_loss=0.01184, over 3061157.53 frames. ], batch size: 71, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:42:47,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=231966.66666666666, ans=0.125 2023-11-18 12:43:07,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-11-18 12:43:12,204 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 9.211e+01 1.033e+02 1.162e+02 1.735e+02, threshold=2.066e+02, percent-clipped=0.0 2023-11-18 12:43:17,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.78 vs. limit=6.0 2023-11-18 12:43:24,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2023-11-18 12:43:40,221 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10800, loss[loss=0.07476, simple_loss=0.07909, pruned_loss=0.02316, audio_tagging_loss=0.01205, over 15216.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1251, pruned_loss=0.03942, audio_tagging_loss=0.01192, over 3056450.87 frames. ], batch size: 58, lr: 1.92e-02, grad_scale: 128.0 2023-11-18 12:44:03,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=232433.33333333334, ans=0.125 2023-11-18 12:44:05,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=232433.33333333334, ans=0.125 2023-11-18 12:44:35,718 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10850, loss[loss=0.1187, simple_loss=0.1368, pruned_loss=0.04028, audio_tagging_loss=0.009998, over 15818.00 frames. ], tot_loss[loss=0.1132, simple_loss=0.1243, pruned_loss=0.03906, audio_tagging_loss=0.01199, over 3055276.48 frames. ], batch size: 58, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:44:48,527 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.100e-01 2023-11-18 12:45:04,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.924e+01 9.821e+01 1.081e+02 1.220e+02 1.822e+02, threshold=2.162e+02, percent-clipped=0.0 2023-11-18 12:45:15,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.70 vs. limit=22.5 2023-11-18 12:45:27,314 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:45:31,520 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10900, loss[loss=0.1094, simple_loss=0.1087, pruned_loss=0.04048, audio_tagging_loss=0.01452, over 14067.00 frames. ], tot_loss[loss=0.1138, simple_loss=0.125, pruned_loss=0.03939, audio_tagging_loss=0.01195, over 3051419.98 frames. ], batch size: 53, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:45:38,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=232966.66666666666, ans=0.0 2023-11-18 12:45:47,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233033.33333333334, ans=0.125 2023-11-18 12:46:08,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=233166.66666666666, ans=0.0 2023-11-18 12:46:23,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=233233.33333333334, ans=0.125 2023-11-18 12:46:27,305 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 10950, loss[loss=0.1002, simple_loss=0.1056, pruned_loss=0.03357, audio_tagging_loss=0.01382, over 15628.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1252, pruned_loss=0.03931, audio_tagging_loss=0.01196, over 3056501.79 frames. ], batch size: 59, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:46:47,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=233366.66666666666, ans=0.125 2023-11-18 12:46:56,640 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 9.530e+01 1.056e+02 1.170e+02 1.707e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 12:47:07,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.55 vs. limit=22.5 2023-11-18 12:47:09,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=233500.0, ans=0.0 2023-11-18 12:47:23,192 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11000, loss[loss=0.1121, simple_loss=0.1251, pruned_loss=0.03891, audio_tagging_loss=0.0106, over 16099.00 frames. ], tot_loss[loss=0.1138, simple_loss=0.1252, pruned_loss=0.03922, audio_tagging_loss=0.012, over 3051843.23 frames. ], batch size: 59, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:47:32,154 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:48:00,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=233833.33333333334, ans=0.1 2023-11-18 12:48:03,796 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2023-11-18 12:48:17,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=233900.0, ans=0.1 2023-11-18 12:48:19,018 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11050, loss[loss=0.08987, simple_loss=0.09763, pruned_loss=0.02686, audio_tagging_loss=0.0142, over 15978.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1259, pruned_loss=0.0394, audio_tagging_loss=0.01203, over 3048493.19 frames. ], batch size: 62, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:48:34,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=234033.33333333334, ans=0.125 2023-11-18 12:48:43,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=234100.0, ans=0.04949747468305833 2023-11-18 12:48:47,582 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 9.699e+01 1.056e+02 1.206e+02 1.867e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 12:48:55,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=234166.66666666666, ans=0.125 2023-11-18 12:48:56,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.96 vs. limit=12.0 2023-11-18 12:48:58,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=234166.66666666666, ans=0.125 2023-11-18 12:49:14,673 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11100, loss[loss=0.09369, simple_loss=0.09114, pruned_loss=0.03001, audio_tagging_loss=0.01811, over 15949.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1267, pruned_loss=0.03979, audio_tagging_loss=0.01211, over 3054580.12 frames. ], batch size: 62, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:49:33,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=234366.66666666666, ans=0.0 2023-11-18 12:49:59,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2023-11-18 12:50:07,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=234566.66666666666, ans=0.125 2023-11-18 12:50:09,605 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11150, loss[loss=0.08766, simple_loss=0.09097, pruned_loss=0.02781, audio_tagging_loss=0.01437, over 14759.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1262, pruned_loss=0.0397, audio_tagging_loss=0.01218, over 3053546.96 frames. ], batch size: 57, lr: 1.91e-02, grad_scale: 64.0 2023-11-18 12:50:18,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=234633.33333333334, ans=0.125 2023-11-18 12:50:23,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=234700.0, ans=0.2 2023-11-18 12:50:32,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.72 vs. limit=15.0 2023-11-18 12:50:38,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=234766.66666666666, ans=0.0 2023-11-18 12:50:39,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 9.879e+01 1.104e+02 1.293e+02 2.710e+02, threshold=2.209e+02, percent-clipped=1.0 2023-11-18 12:50:39,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=234766.66666666666, ans=0.2 2023-11-18 12:50:41,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=234766.66666666666, ans=0.125 2023-11-18 12:51:06,384 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11200, loss[loss=0.1129, simple_loss=0.1294, pruned_loss=0.03642, audio_tagging_loss=0.01173, over 15279.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1259, pruned_loss=0.03967, audio_tagging_loss=0.0123, over 3044806.88 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 64.0 2023-11-18 12:51:17,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-11-18 12:51:20,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=235033.33333333334, ans=0.125 2023-11-18 12:51:31,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=235100.0, ans=0.125 2023-11-18 12:51:36,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235100.0, ans=0.1 2023-11-18 12:51:43,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=12.0 2023-11-18 12:51:51,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235233.33333333334, ans=0.1 2023-11-18 12:52:01,481 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11250, loss[loss=0.1057, simple_loss=0.127, pruned_loss=0.03187, audio_tagging_loss=0.01032, over 15982.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1248, pruned_loss=0.03927, audio_tagging_loss=0.0122, over 3047600.79 frames. ], batch size: 59, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:52:03,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=235300.0, ans=0.125 2023-11-18 12:52:31,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.992e+01 9.651e+01 1.080e+02 1.306e+02 2.369e+02, threshold=2.160e+02, percent-clipped=1.0 2023-11-18 12:52:46,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-18 12:52:51,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=235566.66666666666, ans=10.0 2023-11-18 12:52:56,392 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11300, loss[loss=0.102, simple_loss=0.1146, pruned_loss=0.03269, audio_tagging_loss=0.01206, over 15011.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.127, pruned_loss=0.04015, audio_tagging_loss=0.01196, over 3046196.80 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:53:02,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=235633.33333333334, ans=0.0 2023-11-18 12:53:35,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=235833.33333333334, ans=0.125 2023-11-18 12:53:44,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=235900.0, ans=15.0 2023-11-18 12:53:52,130 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11350, loss[loss=0.1066, simple_loss=0.1137, pruned_loss=0.0389, audio_tagging_loss=0.01082, over 15922.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1268, pruned_loss=0.04004, audio_tagging_loss=0.01181, over 3049366.86 frames. ], batch size: 62, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:54:00,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=235966.66666666666, ans=0.125 2023-11-18 12:54:01,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=235966.66666666666, ans=0.125 2023-11-18 12:54:15,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236100.0, ans=0.1 2023-11-18 12:54:21,997 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.976e+01 9.667e+01 1.093e+02 1.238e+02 1.995e+02, threshold=2.185e+02, percent-clipped=0.0 2023-11-18 12:54:26,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=236166.66666666666, ans=0.125 2023-11-18 12:54:35,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.67 vs. limit=15.0 2023-11-18 12:54:48,556 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11400, loss[loss=0.137, simple_loss=0.1637, pruned_loss=0.04672, audio_tagging_loss=0.008446, over 15022.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1262, pruned_loss=0.03967, audio_tagging_loss=0.01179, over 3044757.07 frames. ], batch size: 53, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:55:07,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=236366.66666666666, ans=0.2 2023-11-18 12:55:07,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=236366.66666666666, ans=0.125 2023-11-18 12:55:20,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=236500.0, ans=0.125 2023-11-18 12:55:37,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.06 vs. limit=22.5 2023-11-18 12:55:43,121 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11450, loss[loss=0.1053, simple_loss=0.1046, pruned_loss=0.03599, audio_tagging_loss=0.01704, over 16411.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1267, pruned_loss=0.03975, audio_tagging_loss=0.01178, over 3045278.25 frames. ], batch size: 66, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:55:45,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-18 12:56:05,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=236766.66666666666, ans=0.125 2023-11-18 12:56:13,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 9.179e+01 9.957e+01 1.104e+02 1.348e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-18 12:56:18,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=236833.33333333334, ans=0.125 2023-11-18 12:56:21,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=236833.33333333334, ans=0.0 2023-11-18 12:56:36,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=236900.0, ans=0.125 2023-11-18 12:56:38,367 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11500, loss[loss=0.1425, simple_loss=0.1561, pruned_loss=0.05334, audio_tagging_loss=0.01111, over 15594.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1267, pruned_loss=0.03955, audio_tagging_loss=0.01168, over 3047982.11 frames. ], batch size: 58, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:56:42,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2023-11-18 12:56:54,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=237033.33333333334, ans=0.0 2023-11-18 12:57:05,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=237100.0, ans=0.125 2023-11-18 12:57:29,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=237233.33333333334, ans=0.2 2023-11-18 12:57:35,031 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11550, loss[loss=0.07882, simple_loss=0.07704, pruned_loss=0.0262, audio_tagging_loss=0.0141, over 16075.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1265, pruned_loss=0.03949, audio_tagging_loss=0.01174, over 3043838.22 frames. ], batch size: 63, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:57:46,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=237366.66666666666, ans=0.2 2023-11-18 12:57:51,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=237366.66666666666, ans=0.125 2023-11-18 12:58:04,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.411e+01 1.015e+02 1.135e+02 1.692e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 12:58:07,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2023-11-18 12:58:08,069 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:58:10,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2023-11-18 12:58:11,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=237500.0, ans=0.125 2023-11-18 12:58:24,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-18 12:58:30,019 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11600, loss[loss=0.1222, simple_loss=0.1279, pruned_loss=0.04812, audio_tagging_loss=0.01014, over 15279.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.1271, pruned_loss=0.03954, audio_tagging_loss=0.01172, over 3049953.38 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:58:30,563 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-11-18 12:58:43,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.40 vs. limit=22.5 2023-11-18 12:58:45,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=237700.0, ans=0.0 2023-11-18 12:58:51,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=237766.66666666666, ans=0.125 2023-11-18 12:58:59,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=237766.66666666666, ans=0.0 2023-11-18 12:59:03,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=237833.33333333334, ans=0.125 2023-11-18 12:59:05,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=237833.33333333334, ans=0.2 2023-11-18 12:59:07,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=237833.33333333334, ans=0.125 2023-11-18 12:59:09,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=237833.33333333334, ans=0.125 2023-11-18 12:59:12,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=237833.33333333334, ans=0.0 2023-11-18 12:59:12,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=237833.33333333334, ans=0.125 2023-11-18 12:59:18,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=237900.0, ans=0.1 2023-11-18 12:59:24,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=237966.66666666666, ans=0.125 2023-11-18 12:59:25,496 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11650, loss[loss=0.1067, simple_loss=0.1166, pruned_loss=0.03305, audio_tagging_loss=0.01538, over 14825.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1274, pruned_loss=0.03948, audio_tagging_loss=0.01182, over 3055438.27 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:59:25,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=237966.66666666666, ans=0.035 2023-11-18 12:59:33,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=237966.66666666666, ans=0.0 2023-11-18 12:59:35,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=238033.33333333334, ans=0.125 2023-11-18 12:59:55,508 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.411e+01 1.037e+02 1.164e+02 1.752e+02, threshold=2.075e+02, percent-clipped=0.0 2023-11-18 13:00:03,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=238166.66666666666, ans=0.025 2023-11-18 13:00:12,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=238233.33333333334, ans=0.2 2023-11-18 13:00:20,932 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11700, loss[loss=0.1146, simple_loss=0.1263, pruned_loss=0.03908, audio_tagging_loss=0.0124, over 15556.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1273, pruned_loss=0.03981, audio_tagging_loss=0.01187, over 3056063.14 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:00:24,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=238300.0, ans=0.0 2023-11-18 13:00:24,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238300.0, ans=0.1 2023-11-18 13:00:55,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=238500.0, ans=0.125 2023-11-18 13:01:03,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-18 13:01:09,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=238566.66666666666, ans=0.0 2023-11-18 13:01:16,338 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11750, loss[loss=0.1027, simple_loss=0.1248, pruned_loss=0.02953, audio_tagging_loss=0.0108, over 14401.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1273, pruned_loss=0.03983, audio_tagging_loss=0.01185, over 3050738.44 frames. ], batch size: 52, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:01:23,086 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:01:24,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=238633.33333333334, ans=0.0 2023-11-18 13:01:30,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=238700.0, ans=10.0 2023-11-18 13:01:33,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=238700.0, ans=0.0 2023-11-18 13:01:46,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 1.047e+02 1.164e+02 1.458e+02 1.909e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 13:02:00,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.70 vs. limit=22.5 2023-11-18 13:02:11,167 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11800, loss[loss=0.1002, simple_loss=0.1103, pruned_loss=0.03564, audio_tagging_loss=0.009437, over 14849.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1263, pruned_loss=0.03941, audio_tagging_loss=0.01193, over 3049609.32 frames. ], batch size: 55, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:02:15,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2023-11-18 13:02:18,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=238966.66666666666, ans=0.125 2023-11-18 13:02:22,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=239033.33333333334, ans=0.04949747468305833 2023-11-18 13:02:35,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=239100.0, ans=0.2 2023-11-18 13:02:53,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=239166.66666666666, ans=0.0 2023-11-18 13:03:07,106 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11850, loss[loss=0.1462, simple_loss=0.1583, pruned_loss=0.05482, audio_tagging_loss=0.0122, over 14922.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1255, pruned_loss=0.0391, audio_tagging_loss=0.01202, over 3046530.83 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:03:10,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-18 13:03:20,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=239366.66666666666, ans=0.125 2023-11-18 13:03:22,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=239366.66666666666, ans=0.0 2023-11-18 13:03:22,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2023-11-18 13:03:23,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239366.66666666666, ans=0.1 2023-11-18 13:03:28,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2023-11-18 13:03:33,396 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:03:36,859 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.964e+01 9.657e+01 1.079e+02 1.246e+02 1.721e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 13:03:40,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=239500.0, ans=0.0 2023-11-18 13:03:54,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=239566.66666666666, ans=22.5 2023-11-18 13:03:54,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239566.66666666666, ans=0.1 2023-11-18 13:04:02,551 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11900, loss[loss=0.09485, simple_loss=0.09875, pruned_loss=0.02549, audio_tagging_loss=0.01998, over 14951.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1247, pruned_loss=0.03891, audio_tagging_loss=0.0122, over 3053060.88 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:04:17,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=239700.0, ans=0.0 2023-11-18 13:04:21,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=239700.0, ans=0.125 2023-11-18 13:04:30,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=239766.66666666666, ans=0.0 2023-11-18 13:04:31,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=239766.66666666666, ans=0.0 2023-11-18 13:04:32,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=239766.66666666666, ans=0.0 2023-11-18 13:04:38,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=239833.33333333334, ans=0.07 2023-11-18 13:04:53,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=239900.0, ans=0.125 2023-11-18 13:04:57,148 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 11950, loss[loss=0.1039, simple_loss=0.118, pruned_loss=0.03177, audio_tagging_loss=0.01314, over 15169.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1248, pruned_loss=0.03878, audio_tagging_loss=0.01233, over 3050099.94 frames. ], batch size: 57, lr: 1.89e-02, grad_scale: 32.0 2023-11-18 13:05:27,273 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2023-11-18 13:05:27,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=240100.0, ans=0.125 2023-11-18 13:05:29,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 9.537e+01 1.099e+02 1.271e+02 1.974e+02, threshold=2.199e+02, percent-clipped=0.0 2023-11-18 13:05:35,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=240166.66666666666, ans=0.125 2023-11-18 13:05:47,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240233.33333333334, ans=0.1 2023-11-18 13:05:53,169 INFO [train_asr.py:1115] (3/4) Epoch 3, batch 12000, loss[loss=0.09765, simple_loss=0.1014, pruned_loss=0.03471, audio_tagging_loss=0.01222, over 15214.00 frames. ], tot_loss[loss=0.1124, simple_loss=0.1232, pruned_loss=0.03832, audio_tagging_loss=0.01249, over 3053948.41 frames. ], batch size: 59, lr: 1.89e-02, grad_scale: 32.0 2023-11-18 13:05:53,170 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 13:06:26,323 INFO [train_asr.py:1147] (3/4) Epoch 3, validation: loss=0.07855, simple_loss=0.06384, pruned_loss=0.01132, audio_tagging_loss=0.03531, over 4681554.00 frames. 2023-11-18 13:06:26,324 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 13:06:43,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240366.66666666666, ans=0.1 2023-11-18 13:07:28,604 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 0, loss[loss=0.1123, simple_loss=0.1069, pruned_loss=0.02946, audio_tagging_loss=0.02936, over 15167.00 frames. ], tot_loss[loss=0.1123, simple_loss=0.1069, pruned_loss=0.02946, audio_tagging_loss=0.02936, over 15167.00 frames. ], batch size: 57, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:07:28,605 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 13:08:00,452 INFO [train_asr.py:1147] (3/4) Epoch 4, validation: loss=0.07694, simple_loss=0.06378, pruned_loss=0.01116, audio_tagging_loss=0.03389, over 4681554.00 frames. 2023-11-18 13:08:00,453 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 13:08:00,732 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:08:14,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=240520.0, ans=0.125 2023-11-18 13:08:17,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=240520.0, ans=0.125 2023-11-18 13:08:23,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=240586.66666666666, ans=0.125 2023-11-18 13:08:35,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=240653.33333333334, ans=0.125 2023-11-18 13:08:39,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=240653.33333333334, ans=0.125 2023-11-18 13:08:50,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=12.0 2023-11-18 13:08:55,949 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 50, loss[loss=0.1128, simple_loss=0.1098, pruned_loss=0.03313, audio_tagging_loss=0.02478, over 16364.00 frames. ], tot_loss[loss=0.1236, simple_loss=0.1242, pruned_loss=0.03837, audio_tagging_loss=0.02315, over 687652.01 frames. ], batch size: 65, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:08:56,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=240786.66666666666, ans=0.1 2023-11-18 13:09:00,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.826e+01 9.935e+01 1.154e+02 1.332e+02 1.872e+02, threshold=2.308e+02, percent-clipped=0.0 2023-11-18 13:09:05,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-11-18 13:09:15,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=240853.33333333334, ans=0.125 2023-11-18 13:09:19,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=240920.0, ans=0.125 2023-11-18 13:09:25,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=240920.0, ans=15.0 2023-11-18 13:09:29,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240986.66666666666, ans=0.1 2023-11-18 13:09:36,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240986.66666666666, ans=0.1 2023-11-18 13:09:46,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=241053.33333333334, ans=0.0 2023-11-18 13:09:51,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=241120.0, ans=0.0 2023-11-18 13:09:52,225 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 100, loss[loss=0.1328, simple_loss=0.1472, pruned_loss=0.04385, audio_tagging_loss=0.01532, over 15063.00 frames. ], tot_loss[loss=0.1221, simple_loss=0.1244, pruned_loss=0.03779, audio_tagging_loss=0.02211, over 1202217.90 frames. ], batch size: 54, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:10:06,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=12.0 2023-11-18 13:10:06,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=241186.66666666666, ans=0.2 2023-11-18 13:10:26,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.20 vs. limit=22.5 2023-11-18 13:10:29,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=241320.0, ans=0.1 2023-11-18 13:10:38,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=241386.66666666666, ans=0.5 2023-11-18 13:10:47,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241453.33333333334, ans=0.1 2023-11-18 13:10:47,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.62 vs. limit=15.0 2023-11-18 13:10:48,255 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 150, loss[loss=0.1281, simple_loss=0.1396, pruned_loss=0.04839, audio_tagging_loss=0.009938, over 14399.00 frames. ], tot_loss[loss=0.1218, simple_loss=0.1263, pruned_loss=0.0389, audio_tagging_loss=0.01971, over 1603984.49 frames. ], batch size: 56, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:10:48,833 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.65 vs. limit=10.0 2023-11-18 13:10:52,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 9.521e+01 1.016e+02 1.130e+02 1.451e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 13:10:59,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=241520.0, ans=0.0 2023-11-18 13:11:08,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=15.0 2023-11-18 13:11:23,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=241653.33333333334, ans=0.1 2023-11-18 13:11:36,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=241720.0, ans=0.0 2023-11-18 13:11:44,091 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 200, loss[loss=0.09931, simple_loss=0.1066, pruned_loss=0.03519, audio_tagging_loss=0.01083, over 14395.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1265, pruned_loss=0.03921, audio_tagging_loss=0.01749, over 1932966.46 frames. ], batch size: 56, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:11:44,606 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.01 vs. limit=10.0 2023-11-18 13:11:54,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=241853.33333333334, ans=0.0 2023-11-18 13:12:04,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-18 13:12:17,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=241986.66666666666, ans=0.2 2023-11-18 13:12:40,379 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 250, loss[loss=0.1365, simple_loss=0.1502, pruned_loss=0.04737, audio_tagging_loss=0.01399, over 16499.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.1261, pruned_loss=0.03883, audio_tagging_loss=0.01571, over 2181643.68 frames. ], batch size: 63, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:12:43,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=242120.0, ans=0.125 2023-11-18 13:12:45,591 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 9.626e+01 1.050e+02 1.196e+02 1.667e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-18 13:13:00,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=242186.66666666666, ans=0.2 2023-11-18 13:13:25,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=242386.66666666666, ans=0.125 2023-11-18 13:13:35,856 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 300, loss[loss=0.1227, simple_loss=0.136, pruned_loss=0.04378, audio_tagging_loss=0.0109, over 14516.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.1279, pruned_loss=0.03927, audio_tagging_loss=0.0144, over 2375665.13 frames. ], batch size: 53, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:13:54,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=242520.0, ans=0.0 2023-11-18 13:14:27,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=242720.0, ans=0.0 2023-11-18 13:14:27,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=242720.0, ans=0.0 2023-11-18 13:14:31,247 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 350, loss[loss=0.1434, simple_loss=0.151, pruned_loss=0.05644, audio_tagging_loss=0.01147, over 15597.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1281, pruned_loss=0.0392, audio_tagging_loss=0.01349, over 2524775.66 frames. ], batch size: 59, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:14:37,027 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 9.705e+01 1.099e+02 1.261e+02 1.880e+02, threshold=2.197e+02, percent-clipped=0.0 2023-11-18 13:14:53,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.65 vs. limit=15.0 2023-11-18 13:15:00,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=242920.0, ans=0.125 2023-11-18 13:15:01,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=15.0 2023-11-18 13:15:27,799 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 400, loss[loss=0.1357, simple_loss=0.1476, pruned_loss=0.05212, audio_tagging_loss=0.009794, over 15105.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1274, pruned_loss=0.03903, audio_tagging_loss=0.01296, over 2645148.01 frames. ], batch size: 55, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:15:35,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=243120.0, ans=0.125 2023-11-18 13:16:22,998 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 450, loss[loss=0.09083, simple_loss=0.1029, pruned_loss=0.02926, audio_tagging_loss=0.01014, over 14466.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.124, pruned_loss=0.03778, audio_tagging_loss=0.01276, over 2735608.80 frames. ], batch size: 55, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:16:28,335 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 9.241e+01 1.029e+02 1.146e+02 1.664e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-18 13:16:31,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=243453.33333333334, ans=0.125 2023-11-18 13:16:37,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.64 vs. limit=10.0 2023-11-18 13:16:47,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=243586.66666666666, ans=0.07 2023-11-18 13:16:52,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=243586.66666666666, ans=0.125 2023-11-18 13:16:54,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=243586.66666666666, ans=0.125 2023-11-18 13:16:55,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2023-11-18 13:16:59,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=243653.33333333334, ans=0.0 2023-11-18 13:17:01,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=243653.33333333334, ans=0.0 2023-11-18 13:17:02,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=243653.33333333334, ans=0.125 2023-11-18 13:17:18,760 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 500, loss[loss=0.1123, simple_loss=0.1215, pruned_loss=0.03982, audio_tagging_loss=0.01175, over 15087.00 frames. ], tot_loss[loss=0.1124, simple_loss=0.1244, pruned_loss=0.0379, audio_tagging_loss=0.01234, over 2806744.34 frames. ], batch size: 60, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:17:26,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=243786.66666666666, ans=0.125 2023-11-18 13:17:49,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=243920.0, ans=0.09899494936611666 2023-11-18 13:17:49,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2023-11-18 13:18:15,523 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 550, loss[loss=0.1299, simple_loss=0.1581, pruned_loss=0.0425, audio_tagging_loss=0.008363, over 15477.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1258, pruned_loss=0.03842, audio_tagging_loss=0.0122, over 2860976.55 frames. ], batch size: 55, lr: 1.76e-02, grad_scale: 8.0 2023-11-18 13:18:19,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=244120.0, ans=0.125 2023-11-18 13:18:23,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 9.469e+01 1.045e+02 1.178e+02 1.805e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 13:18:32,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=244186.66666666666, ans=0.07 2023-11-18 13:18:36,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=244253.33333333334, ans=0.125 2023-11-18 13:18:51,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=244320.0, ans=0.125 2023-11-18 13:19:06,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=244386.66666666666, ans=0.125 2023-11-18 13:19:11,341 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 600, loss[loss=0.1109, simple_loss=0.117, pruned_loss=0.03948, audio_tagging_loss=0.01294, over 15995.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.126, pruned_loss=0.03851, audio_tagging_loss=0.01214, over 2896799.57 frames. ], batch size: 61, lr: 1.76e-02, grad_scale: 8.0 2023-11-18 13:19:12,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=244453.33333333334, ans=0.125 2023-11-18 13:19:16,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=244453.33333333334, ans=0.0 2023-11-18 13:19:32,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=244586.66666666666, ans=0.125 2023-11-18 13:20:06,686 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 650, loss[loss=0.08372, simple_loss=0.09071, pruned_loss=0.02651, audio_tagging_loss=0.01186, over 15206.00 frames. ], tot_loss[loss=0.1131, simple_loss=0.1253, pruned_loss=0.03839, audio_tagging_loss=0.01206, over 2934934.60 frames. ], batch size: 59, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:20:08,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=244786.66666666666, ans=0.5 2023-11-18 13:20:15,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 9.493e+01 1.068e+02 1.179e+02 1.760e+02, threshold=2.137e+02, percent-clipped=0.0 2023-11-18 13:20:18,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=244853.33333333334, ans=0.0 2023-11-18 13:20:24,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-18 13:20:42,562 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2023-11-18 13:20:48,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=244986.66666666666, ans=0.125 2023-11-18 13:20:50,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=245053.33333333334, ans=0.2 2023-11-18 13:20:58,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=245053.33333333334, ans=0.125 2023-11-18 13:21:03,276 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 700, loss[loss=0.07145, simple_loss=0.07141, pruned_loss=0.02259, audio_tagging_loss=0.01316, over 15802.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1241, pruned_loss=0.03798, audio_tagging_loss=0.0121, over 2961860.60 frames. ], batch size: 60, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:21:15,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=245186.66666666666, ans=0.0 2023-11-18 13:21:18,324 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.786e-01 2023-11-18 13:21:48,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=245386.66666666666, ans=0.125 2023-11-18 13:21:49,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=245386.66666666666, ans=0.125 2023-11-18 13:21:59,685 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 750, loss[loss=0.1228, simple_loss=0.1375, pruned_loss=0.04151, audio_tagging_loss=0.01252, over 15116.00 frames. ], tot_loss[loss=0.1126, simple_loss=0.1249, pruned_loss=0.03814, audio_tagging_loss=0.01204, over 2986079.28 frames. ], batch size: 55, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:22:07,024 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.458e+01 1.066e+02 1.214e+02 1.611e+02, threshold=2.132e+02, percent-clipped=0.0 2023-11-18 13:22:12,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2023-11-18 13:22:14,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=245520.0, ans=0.125 2023-11-18 13:22:23,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=245586.66666666666, ans=0.125 2023-11-18 13:22:26,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.95 vs. limit=15.0 2023-11-18 13:22:38,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245653.33333333334, ans=0.1 2023-11-18 13:22:54,563 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 800, loss[loss=0.1159, simple_loss=0.1338, pruned_loss=0.03686, audio_tagging_loss=0.01213, over 15399.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1241, pruned_loss=0.03788, audio_tagging_loss=0.01219, over 2996260.84 frames. ], batch size: 56, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:23:15,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=245853.33333333334, ans=0.125 2023-11-18 13:23:18,453 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2023-11-18 13:23:21,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=245920.0, ans=0.025 2023-11-18 13:23:25,820 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=12.0 2023-11-18 13:23:37,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=245986.66666666666, ans=0.0 2023-11-18 13:23:47,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=246053.33333333334, ans=0.0 2023-11-18 13:23:49,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=246120.0, ans=0.125 2023-11-18 13:23:50,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.32 vs. limit=15.0 2023-11-18 13:23:50,685 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 850, loss[loss=0.1211, simple_loss=0.1369, pruned_loss=0.04102, audio_tagging_loss=0.01159, over 15715.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.1237, pruned_loss=0.03774, audio_tagging_loss=0.01223, over 3014214.29 frames. ], batch size: 58, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:23:59,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.134e+01 9.530e+01 1.051e+02 1.203e+02 1.738e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 13:24:07,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=246186.66666666666, ans=0.1 2023-11-18 13:24:09,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.22 vs. limit=22.5 2023-11-18 13:24:13,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=246253.33333333334, ans=0.125 2023-11-18 13:24:14,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=246253.33333333334, ans=0.125 2023-11-18 13:24:19,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=246253.33333333334, ans=0.0 2023-11-18 13:24:42,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=246386.66666666666, ans=0.0 2023-11-18 13:24:47,011 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 900, loss[loss=0.1138, simple_loss=0.1357, pruned_loss=0.032, audio_tagging_loss=0.01394, over 15153.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1242, pruned_loss=0.03771, audio_tagging_loss=0.01231, over 3027055.01 frames. ], batch size: 59, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:24:51,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2023-11-18 13:24:54,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.52 vs. limit=22.5 2023-11-18 13:25:05,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=246520.0, ans=15.0 2023-11-18 13:25:06,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=246520.0, ans=0.125 2023-11-18 13:25:07,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=246586.66666666666, ans=0.0 2023-11-18 13:25:17,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=246586.66666666666, ans=0.125 2023-11-18 13:25:42,383 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 950, loss[loss=0.1201, simple_loss=0.129, pruned_loss=0.04401, audio_tagging_loss=0.01161, over 16105.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.1239, pruned_loss=0.03747, audio_tagging_loss=0.01217, over 3028858.67 frames. ], batch size: 59, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:25:49,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2023-11-18 13:25:49,689 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.343e+01 1.032e+02 1.151e+02 2.313e+02, threshold=2.063e+02, percent-clipped=1.0 2023-11-18 13:26:20,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=246986.66666666666, ans=0.125 2023-11-18 13:26:37,942 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1000, loss[loss=0.08889, simple_loss=0.1064, pruned_loss=0.02396, audio_tagging_loss=0.01174, over 15446.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1239, pruned_loss=0.0375, audio_tagging_loss=0.01195, over 3033528.80 frames. ], batch size: 59, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:27:01,683 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:27:28,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=247386.66666666666, ans=0.0 2023-11-18 13:27:33,829 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1050, loss[loss=0.09574, simple_loss=0.1084, pruned_loss=0.0305, audio_tagging_loss=0.01106, over 16124.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1226, pruned_loss=0.03686, audio_tagging_loss=0.01191, over 3036531.94 frames. ], batch size: 61, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:27:41,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.026e+01 9.797e+01 1.106e+02 1.274e+02 2.848e+02, threshold=2.212e+02, percent-clipped=1.0 2023-11-18 13:27:51,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2023-11-18 13:28:07,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=247653.33333333334, ans=0.0 2023-11-18 13:28:28,416 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1100, loss[loss=0.1265, simple_loss=0.1366, pruned_loss=0.04547, audio_tagging_loss=0.01274, over 15177.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.123, pruned_loss=0.037, audio_tagging_loss=0.01175, over 3039824.98 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:28:30,567 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:28:52,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=247920.0, ans=0.125 2023-11-18 13:29:07,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.65 vs. limit=15.0 2023-11-18 13:29:16,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-11-18 13:29:23,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=248120.0, ans=0.0 2023-11-18 13:29:24,438 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1150, loss[loss=0.1029, simple_loss=0.1186, pruned_loss=0.03317, audio_tagging_loss=0.01044, over 15078.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1219, pruned_loss=0.03674, audio_tagging_loss=0.01169, over 3033321.13 frames. ], batch size: 55, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:29:31,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 9.396e+01 1.043e+02 1.149e+02 1.593e+02, threshold=2.087e+02, percent-clipped=0.0 2023-11-18 13:29:41,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=248186.66666666666, ans=0.125 2023-11-18 13:30:09,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=248386.66666666666, ans=0.1 2023-11-18 13:30:10,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.19 vs. limit=22.5 2023-11-18 13:30:14,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=15.0 2023-11-18 13:30:21,175 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1200, loss[loss=0.1268, simple_loss=0.1421, pruned_loss=0.04721, audio_tagging_loss=0.008583, over 14750.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.1232, pruned_loss=0.03743, audio_tagging_loss=0.01169, over 3036564.01 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 32.0 2023-11-18 13:30:43,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=248586.66666666666, ans=0.1 2023-11-18 13:30:43,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=248586.66666666666, ans=0.125 2023-11-18 13:30:52,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=248653.33333333334, ans=0.0 2023-11-18 13:30:55,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=248653.33333333334, ans=0.0 2023-11-18 13:31:00,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=248653.33333333334, ans=0.125 2023-11-18 13:31:03,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2023-11-18 13:31:04,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=248720.0, ans=0.0 2023-11-18 13:31:13,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=248720.0, ans=0.5 2023-11-18 13:31:15,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=248786.66666666666, ans=0.125 2023-11-18 13:31:16,190 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1250, loss[loss=0.09992, simple_loss=0.1006, pruned_loss=0.03495, audio_tagging_loss=0.01467, over 14838.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1236, pruned_loss=0.03741, audio_tagging_loss=0.01176, over 3037783.74 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 32.0 2023-11-18 13:31:23,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.534e+01 1.061e+02 1.217e+02 1.836e+02, threshold=2.122e+02, percent-clipped=0.0 2023-11-18 13:31:27,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=248853.33333333334, ans=0.0 2023-11-18 13:31:43,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=248920.0, ans=0.125 2023-11-18 13:31:43,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=248920.0, ans=0.125 2023-11-18 13:31:52,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=248986.66666666666, ans=0.125 2023-11-18 13:31:57,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=248986.66666666666, ans=0.0 2023-11-18 13:31:59,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=249053.33333333334, ans=0.125 2023-11-18 13:32:04,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.66 vs. limit=12.0 2023-11-18 13:32:11,683 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1300, loss[loss=0.08872, simple_loss=0.09161, pruned_loss=0.03007, audio_tagging_loss=0.01284, over 15045.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1243, pruned_loss=0.03759, audio_tagging_loss=0.01166, over 3034045.73 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:32:23,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249186.66666666666, ans=0.1 2023-11-18 13:32:35,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-11-18 13:32:35,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=249253.33333333334, ans=0.125 2023-11-18 13:32:46,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=249320.0, ans=0.0 2023-11-18 13:32:52,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2023-11-18 13:33:01,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=12.0 2023-11-18 13:33:08,239 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1350, loss[loss=0.1222, simple_loss=0.1445, pruned_loss=0.03912, audio_tagging_loss=0.01086, over 15560.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.1247, pruned_loss=0.03761, audio_tagging_loss=0.01168, over 3036249.64 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:33:12,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.74 vs. limit=15.0 2023-11-18 13:33:17,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 9.738e+01 1.103e+02 1.190e+02 1.796e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 13:33:18,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=249520.0, ans=0.0 2023-11-18 13:33:20,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=249520.0, ans=0.125 2023-11-18 13:33:27,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=249520.0, ans=0.05 2023-11-18 13:33:43,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=249653.33333333334, ans=0.1 2023-11-18 13:33:45,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=249653.33333333334, ans=0.125 2023-11-18 13:33:47,417 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:34:04,462 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1400, loss[loss=0.08494, simple_loss=0.08664, pruned_loss=0.0281, audio_tagging_loss=0.01352, over 14669.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1223, pruned_loss=0.03681, audio_tagging_loss=0.01192, over 3029219.30 frames. ], batch size: 57, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:34:32,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=249920.0, ans=0.0 2023-11-18 13:35:00,073 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1450, loss[loss=0.08272, simple_loss=0.08623, pruned_loss=0.02359, audio_tagging_loss=0.01601, over 14553.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1216, pruned_loss=0.03654, audio_tagging_loss=0.01203, over 3032088.75 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:35:09,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.516e+01 1.029e+02 1.105e+02 1.571e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 13:35:15,244 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.40 vs. limit=10.0 2023-11-18 13:35:17,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250186.66666666666, ans=0.1 2023-11-18 13:35:56,371 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1500, loss[loss=0.1308, simple_loss=0.1533, pruned_loss=0.04498, audio_tagging_loss=0.009173, over 15565.00 frames. ], tot_loss[loss=0.1109, simple_loss=0.123, pruned_loss=0.03732, audio_tagging_loss=0.01205, over 3033757.85 frames. ], batch size: 56, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:36:09,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-18 13:36:40,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=250720.0, ans=0.2 2023-11-18 13:36:45,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=250720.0, ans=0.0 2023-11-18 13:36:52,282 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1550, loss[loss=0.1182, simple_loss=0.1285, pruned_loss=0.04018, audio_tagging_loss=0.01378, over 15369.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1233, pruned_loss=0.03756, audio_tagging_loss=0.01212, over 3033467.51 frames. ], batch size: 58, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:37:01,191 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 9.375e+01 1.072e+02 1.254e+02 1.823e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 13:37:11,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=250853.33333333334, ans=0.125 2023-11-18 13:37:38,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=251053.33333333334, ans=0.035 2023-11-18 13:37:47,454 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1600, loss[loss=0.1421, simple_loss=0.1737, pruned_loss=0.04502, audio_tagging_loss=0.01025, over 16389.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1233, pruned_loss=0.03745, audio_tagging_loss=0.01209, over 3032724.22 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 32.0 2023-11-18 13:37:55,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=251120.0, ans=0.1 2023-11-18 13:37:56,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=251120.0, ans=0.125 2023-11-18 13:37:59,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=251186.66666666666, ans=0.125 2023-11-18 13:38:04,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251186.66666666666, ans=0.1 2023-11-18 13:38:14,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=251253.33333333334, ans=0.125 2023-11-18 13:38:16,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=251253.33333333334, ans=0.125 2023-11-18 13:38:23,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=251320.0, ans=0.2 2023-11-18 13:38:38,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=251386.66666666666, ans=0.0 2023-11-18 13:38:43,340 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1650, loss[loss=0.1155, simple_loss=0.1255, pruned_loss=0.0387, audio_tagging_loss=0.01411, over 16092.00 frames. ], tot_loss[loss=0.1109, simple_loss=0.1231, pruned_loss=0.03727, audio_tagging_loss=0.01206, over 3035586.32 frames. ], batch size: 58, lr: 1.73e-02, grad_scale: 32.0 2023-11-18 13:38:49,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-11-18 13:38:52,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=251453.33333333334, ans=0.0 2023-11-18 13:38:52,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.946e+01 1.090e+02 1.261e+02 1.677e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 13:39:10,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=15.0 2023-11-18 13:39:19,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=251653.33333333334, ans=0.2 2023-11-18 13:39:39,287 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1700, loss[loss=0.1335, simple_loss=0.1533, pruned_loss=0.04924, audio_tagging_loss=0.00764, over 15060.00 frames. ], tot_loss[loss=0.1111, simple_loss=0.1233, pruned_loss=0.03738, audio_tagging_loss=0.01211, over 3037999.68 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:40:19,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=251986.66666666666, ans=0.125 2023-11-18 13:40:24,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=252053.33333333334, ans=0.125 2023-11-18 13:40:35,207 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1750, loss[loss=0.1292, simple_loss=0.1474, pruned_loss=0.04097, audio_tagging_loss=0.01451, over 15192.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1227, pruned_loss=0.03728, audio_tagging_loss=0.01199, over 3039383.47 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:40:36,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.42 vs. limit=22.5 2023-11-18 13:40:43,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=252120.0, ans=0.2 2023-11-18 13:40:45,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 9.260e+01 1.013e+02 1.177e+02 1.598e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 13:40:45,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=252186.66666666666, ans=0.125 2023-11-18 13:40:49,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=252186.66666666666, ans=0.1 2023-11-18 13:41:12,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=252320.0, ans=0.125 2023-11-18 13:41:19,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=252386.66666666666, ans=0.0 2023-11-18 13:41:19,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.16 vs. limit=22.5 2023-11-18 13:41:31,151 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1800, loss[loss=0.08284, simple_loss=0.09048, pruned_loss=0.0233, audio_tagging_loss=0.01431, over 14332.00 frames. ], tot_loss[loss=0.1103, simple_loss=0.1228, pruned_loss=0.03707, audio_tagging_loss=0.01185, over 3049201.25 frames. ], batch size: 56, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:41:32,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=252453.33333333334, ans=0.0 2023-11-18 13:41:48,383 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.039e-01 2023-11-18 13:42:06,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=252653.33333333334, ans=0.0 2023-11-18 13:42:27,626 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1850, loss[loss=0.1127, simple_loss=0.1313, pruned_loss=0.03703, audio_tagging_loss=0.009977, over 16373.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1225, pruned_loss=0.03696, audio_tagging_loss=0.01177, over 3051285.00 frames. ], batch size: 59, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:42:37,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 9.907e+01 1.064e+02 1.171e+02 1.741e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 13:42:37,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=252853.33333333334, ans=0.07 2023-11-18 13:43:02,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=252986.66666666666, ans=0.125 2023-11-18 13:43:08,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=252986.66666666666, ans=0.2 2023-11-18 13:43:22,177 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1900, loss[loss=0.09773, simple_loss=0.1209, pruned_loss=0.02785, audio_tagging_loss=0.009428, over 15005.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1224, pruned_loss=0.03687, audio_tagging_loss=0.01176, over 3047846.28 frames. ], batch size: 55, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:43:37,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=253186.66666666666, ans=0.1 2023-11-18 13:43:39,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=253186.66666666666, ans=0.0 2023-11-18 13:43:41,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=253186.66666666666, ans=0.95 2023-11-18 13:44:07,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=253386.66666666666, ans=0.2 2023-11-18 13:44:15,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=253386.66666666666, ans=0.125 2023-11-18 13:44:18,708 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 1950, loss[loss=0.1009, simple_loss=0.1087, pruned_loss=0.03153, audio_tagging_loss=0.01501, over 16542.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1214, pruned_loss=0.03656, audio_tagging_loss=0.01187, over 3047258.62 frames. ], batch size: 62, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:44:26,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=253453.33333333334, ans=0.2 2023-11-18 13:44:29,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 9.223e+01 1.021e+02 1.142e+02 1.490e+02, threshold=2.042e+02, percent-clipped=0.0 2023-11-18 13:44:47,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-18 13:44:48,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=22.5 2023-11-18 13:44:51,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=253653.33333333334, ans=0.0 2023-11-18 13:45:03,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253720.0, ans=0.1 2023-11-18 13:45:07,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=253720.0, ans=0.125 2023-11-18 13:45:15,429 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2000, loss[loss=0.1076, simple_loss=0.1278, pruned_loss=0.03302, audio_tagging_loss=0.01074, over 14505.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1212, pruned_loss=0.03655, audio_tagging_loss=0.01185, over 3046713.59 frames. ], batch size: 54, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:45:28,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2023-11-18 13:45:31,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=253853.33333333334, ans=0.125 2023-11-18 13:45:32,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=253853.33333333334, ans=0.125 2023-11-18 13:45:34,109 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-18 13:45:35,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2023-11-18 13:45:39,816 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.72 vs. limit=15.0 2023-11-18 13:45:40,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2023-11-18 13:46:10,809 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2050, loss[loss=0.1253, simple_loss=0.1351, pruned_loss=0.0455, audio_tagging_loss=0.01226, over 14065.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1227, pruned_loss=0.03697, audio_tagging_loss=0.01164, over 3045320.54 frames. ], batch size: 55, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:46:21,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 9.338e+01 1.033e+02 1.135e+02 2.200e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 13:46:23,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254186.66666666666, ans=0.1 2023-11-18 13:46:32,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=254253.33333333334, ans=0.125 2023-11-18 13:46:37,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=254253.33333333334, ans=0.125 2023-11-18 13:46:51,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=254320.0, ans=0.125 2023-11-18 13:47:06,220 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2100, loss[loss=0.1159, simple_loss=0.1431, pruned_loss=0.03533, audio_tagging_loss=0.009048, over 15302.00 frames. ], tot_loss[loss=0.1103, simple_loss=0.1233, pruned_loss=0.03703, audio_tagging_loss=0.01162, over 3048627.90 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:47:08,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=254453.33333333334, ans=0.125 2023-11-18 13:47:23,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=254520.0, ans=0.125 2023-11-18 13:47:29,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254586.66666666666, ans=0.1 2023-11-18 13:47:36,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.20 vs. limit=22.5 2023-11-18 13:47:36,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=254586.66666666666, ans=0.125 2023-11-18 13:47:43,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=254653.33333333334, ans=0.125 2023-11-18 13:47:43,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=254653.33333333334, ans=0.1 2023-11-18 13:47:45,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2023-11-18 13:47:49,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=254653.33333333334, ans=0.125 2023-11-18 13:47:52,476 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:47:55,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=254720.0, ans=0.125 2023-11-18 13:48:03,508 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2150, loss[loss=0.108, simple_loss=0.1108, pruned_loss=0.04148, audio_tagging_loss=0.01111, over 15354.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.1246, pruned_loss=0.03765, audio_tagging_loss=0.01168, over 3050631.64 frames. ], batch size: 58, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:48:14,125 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 9.557e+01 1.080e+02 1.239e+02 1.582e+02, threshold=2.161e+02, percent-clipped=1.0 2023-11-18 13:48:27,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=254920.0, ans=0.125 2023-11-18 13:48:31,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=254920.0, ans=0.125 2023-11-18 13:48:36,063 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:48:58,275 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2200, loss[loss=0.1333, simple_loss=0.1524, pruned_loss=0.04661, audio_tagging_loss=0.01055, over 15154.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.124, pruned_loss=0.03756, audio_tagging_loss=0.01183, over 3046346.34 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:49:53,795 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2250, loss[loss=0.1464, simple_loss=0.1622, pruned_loss=0.05476, audio_tagging_loss=0.01057, over 13882.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.1244, pruned_loss=0.03752, audio_tagging_loss=0.01184, over 3048932.62 frames. ], batch size: 54, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:49:56,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=255453.33333333334, ans=0.125 2023-11-18 13:50:05,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 9.450e+01 1.063e+02 1.205e+02 1.681e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 13:50:09,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2023-11-18 13:50:15,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=255520.0, ans=0.0 2023-11-18 13:50:36,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=255653.33333333334, ans=0.2 2023-11-18 13:50:38,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=255720.0, ans=0.5 2023-11-18 13:50:45,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=255720.0, ans=0.0 2023-11-18 13:50:48,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=255720.0, ans=0.0 2023-11-18 13:50:50,931 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2300, loss[loss=0.1078, simple_loss=0.1168, pruned_loss=0.03679, audio_tagging_loss=0.01257, over 14795.00 frames. ], tot_loss[loss=0.1109, simple_loss=0.1239, pruned_loss=0.03712, audio_tagging_loss=0.01184, over 3051731.45 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:51:11,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-11-18 13:51:16,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=255920.0, ans=0.125 2023-11-18 13:51:17,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=255920.0, ans=0.125 2023-11-18 13:51:26,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=255986.66666666666, ans=0.0 2023-11-18 13:51:31,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=255986.66666666666, ans=0.0 2023-11-18 13:51:32,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=255986.66666666666, ans=0.04949747468305833 2023-11-18 13:51:32,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=255986.66666666666, ans=0.125 2023-11-18 13:51:34,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=15.0 2023-11-18 13:51:35,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=256053.33333333334, ans=0.125 2023-11-18 13:51:39,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.99 vs. limit=22.5 2023-11-18 13:51:39,959 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:51:46,287 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2350, loss[loss=0.09424, simple_loss=0.1035, pruned_loss=0.02906, audio_tagging_loss=0.01345, over 15088.00 frames. ], tot_loss[loss=0.1108, simple_loss=0.1234, pruned_loss=0.0371, audio_tagging_loss=0.01201, over 3047307.53 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:51:57,478 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.372e+01 1.028e+02 1.162e+02 1.776e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 13:52:09,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=15.0 2023-11-18 13:52:11,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=256253.33333333334, ans=0.2 2023-11-18 13:52:13,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=256253.33333333334, ans=0.125 2023-11-18 13:52:15,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=256253.33333333334, ans=0.0 2023-11-18 13:52:17,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2023-11-18 13:52:34,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256386.66666666666, ans=0.1 2023-11-18 13:52:41,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256453.33333333334, ans=0.1 2023-11-18 13:52:42,247 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2400, loss[loss=0.08171, simple_loss=0.08932, pruned_loss=0.02162, audio_tagging_loss=0.01544, over 14215.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1245, pruned_loss=0.03766, audio_tagging_loss=0.01205, over 3046251.19 frames. ], batch size: 56, lr: 1.72e-02, grad_scale: 32.0 2023-11-18 13:52:57,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=256520.0, ans=0.125 2023-11-18 13:53:00,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=256520.0, ans=0.125 2023-11-18 13:53:05,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=256586.66666666666, ans=0.125 2023-11-18 13:53:08,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256586.66666666666, ans=0.1 2023-11-18 13:53:09,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=256586.66666666666, ans=0.04949747468305833 2023-11-18 13:53:17,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=256653.33333333334, ans=0.2 2023-11-18 13:53:24,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=256653.33333333334, ans=0.0 2023-11-18 13:53:27,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=256720.0, ans=0.07 2023-11-18 13:53:30,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=256720.0, ans=0.04949747468305833 2023-11-18 13:53:38,556 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2450, loss[loss=0.09974, simple_loss=0.1256, pruned_loss=0.02533, audio_tagging_loss=0.01163, over 16107.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1246, pruned_loss=0.03775, audio_tagging_loss=0.0121, over 3040016.16 frames. ], batch size: 60, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:53:49,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 9.544e+01 1.043e+02 1.156e+02 1.781e+02, threshold=2.086e+02, percent-clipped=0.0 2023-11-18 13:54:08,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=256920.0, ans=0.125 2023-11-18 13:54:10,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=256986.66666666666, ans=10.0 2023-11-18 13:54:14,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=256986.66666666666, ans=0.0 2023-11-18 13:54:21,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=257053.33333333334, ans=0.125 2023-11-18 13:54:25,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=257053.33333333334, ans=0.0 2023-11-18 13:54:32,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=257053.33333333334, ans=0.0 2023-11-18 13:54:33,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=257120.0, ans=0.125 2023-11-18 13:54:33,905 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2500, loss[loss=0.1179, simple_loss=0.1285, pruned_loss=0.04303, audio_tagging_loss=0.01064, over 14433.00 frames. ], tot_loss[loss=0.1122, simple_loss=0.1249, pruned_loss=0.03774, audio_tagging_loss=0.01206, over 3042670.70 frames. ], batch size: 53, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:54:37,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=257120.0, ans=0.125 2023-11-18 13:54:41,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=257120.0, ans=0.125 2023-11-18 13:55:03,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=257253.33333333334, ans=0.0 2023-11-18 13:55:06,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=257320.0, ans=0.5 2023-11-18 13:55:29,859 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2550, loss[loss=0.1024, simple_loss=0.1106, pruned_loss=0.03191, audio_tagging_loss=0.01523, over 14797.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1237, pruned_loss=0.03749, audio_tagging_loss=0.01203, over 3043236.25 frames. ], batch size: 57, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:55:40,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.163e+01 9.922e+01 1.114e+02 1.302e+02 1.822e+02, threshold=2.229e+02, percent-clipped=0.0 2023-11-18 13:55:50,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2023-11-18 13:56:09,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=257653.33333333334, ans=0.0 2023-11-18 13:56:25,578 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2600, loss[loss=0.1122, simple_loss=0.1184, pruned_loss=0.03598, audio_tagging_loss=0.01704, over 13927.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1222, pruned_loss=0.03708, audio_tagging_loss=0.01195, over 3037730.38 frames. ], batch size: 53, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:56:37,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257853.33333333334, ans=0.1 2023-11-18 13:56:44,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2023-11-18 13:56:53,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=257920.0, ans=0.2 2023-11-18 13:57:17,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=258053.33333333334, ans=0.125 2023-11-18 13:57:20,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=258120.0, ans=0.125 2023-11-18 13:57:21,396 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2650, loss[loss=0.06607, simple_loss=0.06491, pruned_loss=0.02397, audio_tagging_loss=0.009643, over 14335.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1247, pruned_loss=0.03785, audio_tagging_loss=0.01172, over 3033930.98 frames. ], batch size: 57, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:57:22,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=258120.0, ans=0.0 2023-11-18 13:57:32,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.000e+01 9.528e+01 1.033e+02 1.143e+02 1.471e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 13:58:04,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=258320.0, ans=0.2 2023-11-18 13:58:17,139 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2700, loss[loss=0.1007, simple_loss=0.1118, pruned_loss=0.03434, audio_tagging_loss=0.01051, over 14738.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1232, pruned_loss=0.03717, audio_tagging_loss=0.01163, over 3038096.25 frames. ], batch size: 56, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:58:22,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=258453.33333333334, ans=0.0 2023-11-18 13:58:25,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=258453.33333333334, ans=0.125 2023-11-18 13:59:13,215 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2750, loss[loss=0.08639, simple_loss=0.08907, pruned_loss=0.02881, audio_tagging_loss=0.01305, over 14027.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1223, pruned_loss=0.03691, audio_tagging_loss=0.01171, over 3034326.46 frames. ], batch size: 56, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:59:24,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 9.301e+01 1.031e+02 1.106e+02 1.514e+02, threshold=2.061e+02, percent-clipped=0.0 2023-11-18 13:59:29,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-18 13:59:37,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=15.0 2023-11-18 13:59:37,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.97 vs. limit=15.0 2023-11-18 13:59:39,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=258920.0, ans=0.125 2023-11-18 13:59:42,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=15.0 2023-11-18 14:00:01,472 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:00:08,824 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2800, loss[loss=0.139, simple_loss=0.1516, pruned_loss=0.05518, audio_tagging_loss=0.007985, over 15997.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1217, pruned_loss=0.03705, audio_tagging_loss=0.01178, over 3034491.68 frames. ], batch size: 58, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 14:00:43,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.61 vs. limit=22.5 2023-11-18 14:01:04,440 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2850, loss[loss=0.155, simple_loss=0.171, pruned_loss=0.06006, audio_tagging_loss=0.009391, over 15923.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1224, pruned_loss=0.03724, audio_tagging_loss=0.01173, over 3038112.76 frames. ], batch size: 56, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 14:01:04,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.33 vs. limit=15.0 2023-11-18 14:01:05,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=259453.33333333334, ans=0.125 2023-11-18 14:01:15,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 9.766e+01 1.049e+02 1.164e+02 1.614e+02, threshold=2.099e+02, percent-clipped=0.0 2023-11-18 14:01:41,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=259653.33333333334, ans=0.125 2023-11-18 14:01:49,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=259720.0, ans=0.0 2023-11-18 14:01:50,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=259720.0, ans=0.0 2023-11-18 14:01:51,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.24 vs. limit=22.5 2023-11-18 14:01:55,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.05 vs. limit=15.0 2023-11-18 14:02:00,176 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2900, loss[loss=0.1003, simple_loss=0.1103, pruned_loss=0.03483, audio_tagging_loss=0.01028, over 16035.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.123, pruned_loss=0.03745, audio_tagging_loss=0.01166, over 3044186.63 frames. ], batch size: 59, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:02:07,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-18 14:02:12,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=259853.33333333334, ans=0.125 2023-11-18 14:02:13,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=259853.33333333334, ans=0.125 2023-11-18 14:02:16,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=259853.33333333334, ans=0.125 2023-11-18 14:02:30,597 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:02:33,582 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-11-18 14:02:38,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=259986.66666666666, ans=0.0 2023-11-18 14:02:51,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=260053.33333333334, ans=0.125 2023-11-18 14:02:56,628 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 2950, loss[loss=0.106, simple_loss=0.1146, pruned_loss=0.03656, audio_tagging_loss=0.01209, over 14109.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1233, pruned_loss=0.03767, audio_tagging_loss=0.01166, over 3043495.26 frames. ], batch size: 55, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:03:07,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.370e+01 1.013e+02 1.101e+02 1.808e+02, threshold=2.027e+02, percent-clipped=0.0 2023-11-18 14:03:12,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=260186.66666666666, ans=0.125 2023-11-18 14:03:14,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.41 vs. limit=10.0 2023-11-18 14:03:18,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=260253.33333333334, ans=0.125 2023-11-18 14:03:20,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.16 vs. limit=15.0 2023-11-18 14:03:23,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=260253.33333333334, ans=0.125 2023-11-18 14:03:38,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=260320.0, ans=0.0 2023-11-18 14:03:44,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-11-18 14:03:51,839 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3000, loss[loss=0.1173, simple_loss=0.132, pruned_loss=0.03889, audio_tagging_loss=0.01237, over 14283.00 frames. ], tot_loss[loss=0.1115, simple_loss=0.1241, pruned_loss=0.0377, audio_tagging_loss=0.01174, over 3039297.40 frames. ], batch size: 54, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:03:51,840 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 14:04:25,237 INFO [train_asr.py:1147] (3/4) Epoch 4, validation: loss=0.07718, simple_loss=0.06278, pruned_loss=0.01045, audio_tagging_loss=0.03534, over 4681554.00 frames. 2023-11-18 14:04:25,238 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 14:04:32,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=260453.33333333334, ans=0.0 2023-11-18 14:05:07,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=260653.33333333334, ans=0.0 2023-11-18 14:05:20,216 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3050, loss[loss=0.07348, simple_loss=0.08223, pruned_loss=0.02202, audio_tagging_loss=0.01033, over 14454.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.1255, pruned_loss=0.03802, audio_tagging_loss=0.01173, over 3037840.90 frames. ], batch size: 58, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:05:22,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=260786.66666666666, ans=0.0 2023-11-18 14:05:30,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 9.501e+01 1.094e+02 1.227e+02 1.890e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 14:05:42,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=260920.0, ans=0.0 2023-11-18 14:05:53,400 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:06:04,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.49 vs. limit=12.0 2023-11-18 14:06:14,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-11-18 14:06:15,724 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3100, loss[loss=0.08883, simple_loss=0.09734, pruned_loss=0.02705, audio_tagging_loss=0.01311, over 14692.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1247, pruned_loss=0.03776, audio_tagging_loss=0.01176, over 3041111.79 frames. ], batch size: 56, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:06:19,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=261120.0, ans=0.125 2023-11-18 14:06:27,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=261186.66666666666, ans=0.0 2023-11-18 14:06:47,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=261253.33333333334, ans=0.0 2023-11-18 14:06:52,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=261320.0, ans=0.125 2023-11-18 14:07:09,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=261386.66666666666, ans=0.125 2023-11-18 14:07:12,445 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3150, loss[loss=0.1468, simple_loss=0.1754, pruned_loss=0.05094, audio_tagging_loss=0.008176, over 15715.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1248, pruned_loss=0.03787, audio_tagging_loss=0.01177, over 3037230.60 frames. ], batch size: 57, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:07:24,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.617e+01 1.054e+02 1.142e+02 1.769e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 14:07:43,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=261586.66666666666, ans=0.125 2023-11-18 14:07:59,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=261720.0, ans=0.0 2023-11-18 14:08:02,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=261720.0, ans=0.0 2023-11-18 14:08:09,111 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3200, loss[loss=0.09823, simple_loss=0.1076, pruned_loss=0.03146, audio_tagging_loss=0.01296, over 15964.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1244, pruned_loss=0.03773, audio_tagging_loss=0.01198, over 3047937.58 frames. ], batch size: 62, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:08:11,632 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2023-11-18 14:08:12,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261786.66666666666, ans=0.1 2023-11-18 14:08:18,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.46 vs. limit=15.0 2023-11-18 14:08:24,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=261853.33333333334, ans=0.2 2023-11-18 14:08:31,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.15 vs. limit=12.0 2023-11-18 14:08:37,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=261920.0, ans=0.125 2023-11-18 14:08:47,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=261986.66666666666, ans=0.0 2023-11-18 14:09:00,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.34 vs. limit=15.0 2023-11-18 14:09:02,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=262053.33333333334, ans=0.125 2023-11-18 14:09:04,136 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3250, loss[loss=0.104, simple_loss=0.1204, pruned_loss=0.03068, audio_tagging_loss=0.01312, over 15227.00 frames. ], tot_loss[loss=0.112, simple_loss=0.1246, pruned_loss=0.03774, audio_tagging_loss=0.01197, over 3045512.63 frames. ], batch size: 56, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:09:09,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=262120.0, ans=0.125 2023-11-18 14:09:15,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 9.218e+01 1.067e+02 1.190e+02 1.746e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 14:09:24,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=262186.6666666667, ans=0.1 2023-11-18 14:09:40,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=262320.0, ans=0.0 2023-11-18 14:09:48,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=262386.6666666667, ans=0.0 2023-11-18 14:09:59,320 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3300, loss[loss=0.1081, simple_loss=0.1175, pruned_loss=0.03668, audio_tagging_loss=0.01262, over 15335.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1241, pruned_loss=0.03753, audio_tagging_loss=0.01206, over 3035306.91 frames. ], batch size: 57, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:10:13,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=262520.0, ans=0.125 2023-11-18 14:10:41,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=262653.3333333333, ans=0.2 2023-11-18 14:10:49,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=262720.0, ans=0.2 2023-11-18 14:10:56,640 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3350, loss[loss=0.1098, simple_loss=0.123, pruned_loss=0.03306, audio_tagging_loss=0.01524, over 14342.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1234, pruned_loss=0.03725, audio_tagging_loss=0.01207, over 3042976.57 frames. ], batch size: 53, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:11:07,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 9.463e+01 1.035e+02 1.183e+02 1.659e+02, threshold=2.070e+02, percent-clipped=0.0 2023-11-18 14:11:21,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-11-18 14:11:32,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2023-11-18 14:11:42,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=263053.3333333333, ans=0.125 2023-11-18 14:11:43,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=12.0 2023-11-18 14:11:51,603 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3400, loss[loss=0.1224, simple_loss=0.1388, pruned_loss=0.04386, audio_tagging_loss=0.009105, over 16286.00 frames. ], tot_loss[loss=0.1108, simple_loss=0.1234, pruned_loss=0.03726, audio_tagging_loss=0.01183, over 3050575.11 frames. ], batch size: 58, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:11:56,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2023-11-18 14:11:58,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2023-11-18 14:11:58,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2023-11-18 14:12:01,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=263186.6666666667, ans=0.125 2023-11-18 14:12:13,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=263253.3333333333, ans=0.125 2023-11-18 14:12:19,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=263253.3333333333, ans=0.125 2023-11-18 14:12:26,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=263320.0, ans=0.0 2023-11-18 14:12:43,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=263386.6666666667, ans=0.5 2023-11-18 14:12:47,597 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3450, loss[loss=0.07204, simple_loss=0.07724, pruned_loss=0.01981, audio_tagging_loss=0.01362, over 15824.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1231, pruned_loss=0.03709, audio_tagging_loss=0.01172, over 3053673.40 frames. ], batch size: 60, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:12:59,380 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 9.398e+01 1.018e+02 1.161e+02 1.639e+02, threshold=2.037e+02, percent-clipped=0.0 2023-11-18 14:13:03,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=263520.0, ans=0.125 2023-11-18 14:13:14,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=263586.6666666667, ans=0.1 2023-11-18 14:13:15,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=263586.6666666667, ans=0.125 2023-11-18 14:13:33,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=263720.0, ans=0.035 2023-11-18 14:13:40,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=263720.0, ans=0.0 2023-11-18 14:13:44,345 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3500, loss[loss=0.1091, simple_loss=0.1272, pruned_loss=0.03398, audio_tagging_loss=0.01152, over 15795.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1221, pruned_loss=0.03663, audio_tagging_loss=0.0117, over 3042406.87 frames. ], batch size: 57, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:13:57,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=263853.3333333333, ans=0.125 2023-11-18 14:14:04,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.80 vs. limit=15.0 2023-11-18 14:14:11,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=263920.0, ans=0.125 2023-11-18 14:14:12,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=263920.0, ans=0.125 2023-11-18 14:14:13,273 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:14:24,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=263986.6666666667, ans=0.0 2023-11-18 14:14:40,051 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3550, loss[loss=0.1072, simple_loss=0.1372, pruned_loss=0.03179, audio_tagging_loss=0.006825, over 14056.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1227, pruned_loss=0.03691, audio_tagging_loss=0.01171, over 3046000.69 frames. ], batch size: 52, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:14:41,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=264120.0, ans=0.2 2023-11-18 14:14:44,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264120.0, ans=0.1 2023-11-18 14:14:46,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2023-11-18 14:14:51,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 9.400e+01 1.087e+02 1.239e+02 1.521e+02, threshold=2.174e+02, percent-clipped=0.0 2023-11-18 14:14:54,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2023-11-18 14:15:21,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=264320.0, ans=0.0 2023-11-18 14:15:23,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-18 14:15:35,546 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3600, loss[loss=0.09551, simple_loss=0.1124, pruned_loss=0.0291, audio_tagging_loss=0.01019, over 15413.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1208, pruned_loss=0.03617, audio_tagging_loss=0.01173, over 3051085.74 frames. ], batch size: 58, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:15:52,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=264520.0, ans=0.2 2023-11-18 14:15:58,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=264586.6666666667, ans=0.0 2023-11-18 14:16:12,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264653.3333333333, ans=0.1 2023-11-18 14:16:16,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=264653.3333333333, ans=0.0 2023-11-18 14:16:17,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2023-11-18 14:16:19,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2023-11-18 14:16:29,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=264720.0, ans=0.125 2023-11-18 14:16:32,061 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3650, loss[loss=0.1197, simple_loss=0.145, pruned_loss=0.03798, audio_tagging_loss=0.009195, over 15500.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1209, pruned_loss=0.03612, audio_tagging_loss=0.01167, over 3055310.79 frames. ], batch size: 55, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:16:38,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=264786.6666666667, ans=0.1 2023-11-18 14:16:38,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=264786.6666666667, ans=0.2 2023-11-18 14:16:43,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 9.557e+01 1.072e+02 1.214e+02 1.788e+02, threshold=2.145e+02, percent-clipped=0.0 2023-11-18 14:16:45,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=264853.3333333333, ans=0.125 2023-11-18 14:16:47,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2023-11-18 14:16:52,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.33 vs. limit=22.5 2023-11-18 14:17:16,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=265053.3333333333, ans=0.125 2023-11-18 14:17:18,910 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.797e-02 2023-11-18 14:17:21,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265053.3333333333, ans=0.1 2023-11-18 14:17:27,643 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3700, loss[loss=0.1186, simple_loss=0.1223, pruned_loss=0.04122, audio_tagging_loss=0.01626, over 15112.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1211, pruned_loss=0.0363, audio_tagging_loss=0.01172, over 3058687.55 frames. ], batch size: 57, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:17:28,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=265120.0, ans=0.1 2023-11-18 14:17:35,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=12.0 2023-11-18 14:17:37,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=265186.6666666667, ans=0.07 2023-11-18 14:17:50,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=265253.3333333333, ans=0.0 2023-11-18 14:18:15,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=265386.6666666667, ans=0.0 2023-11-18 14:18:23,561 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3750, loss[loss=0.1357, simple_loss=0.1537, pruned_loss=0.04715, audio_tagging_loss=0.01171, over 15896.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.122, pruned_loss=0.03671, audio_tagging_loss=0.01178, over 3059567.38 frames. ], batch size: 56, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:18:27,129 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.37 vs. limit=10.0 2023-11-18 14:18:34,675 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 1.034e+02 1.153e+02 1.284e+02 1.931e+02, threshold=2.306e+02, percent-clipped=0.0 2023-11-18 14:18:40,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=265520.0, ans=0.125 2023-11-18 14:18:43,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=265520.0, ans=0.125 2023-11-18 14:18:55,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.25 vs. limit=22.5 2023-11-18 14:19:02,317 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:19:03,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=265653.3333333333, ans=0.125 2023-11-18 14:19:09,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.66 vs. limit=10.0 2023-11-18 14:19:19,473 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.33 vs. limit=10.0 2023-11-18 14:19:19,902 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3800, loss[loss=0.1248, simple_loss=0.1415, pruned_loss=0.04483, audio_tagging_loss=0.009231, over 15107.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1216, pruned_loss=0.03672, audio_tagging_loss=0.01185, over 3056928.10 frames. ], batch size: 59, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:19:24,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=8.0 2023-11-18 14:19:30,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-18 14:19:35,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=265853.3333333333, ans=0.125 2023-11-18 14:19:47,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265920.0, ans=0.1 2023-11-18 14:20:11,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2023-11-18 14:20:15,029 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3850, loss[loss=0.1021, simple_loss=0.1191, pruned_loss=0.0335, audio_tagging_loss=0.008999, over 14738.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1215, pruned_loss=0.03648, audio_tagging_loss=0.01188, over 3049203.44 frames. ], batch size: 55, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:20:22,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=266120.0, ans=0.1 2023-11-18 14:20:22,149 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:20:26,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 9.423e+01 1.054e+02 1.147e+02 1.619e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 14:20:56,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=266320.0, ans=0.0 2023-11-18 14:21:05,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=266386.6666666667, ans=0.05 2023-11-18 14:21:10,653 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3900, loss[loss=0.1169, simple_loss=0.1239, pruned_loss=0.04227, audio_tagging_loss=0.01263, over 14411.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1218, pruned_loss=0.03652, audio_tagging_loss=0.01201, over 3047544.57 frames. ], batch size: 56, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:21:31,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=12.0 2023-11-18 14:21:32,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=266586.6666666667, ans=0.125 2023-11-18 14:22:03,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=12.0 2023-11-18 14:22:04,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=266720.0, ans=0.07 2023-11-18 14:22:10,118 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 3950, loss[loss=0.1263, simple_loss=0.1462, pruned_loss=0.04304, audio_tagging_loss=0.01018, over 14086.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1215, pruned_loss=0.03638, audio_tagging_loss=0.0121, over 3041368.99 frames. ], batch size: 53, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:22:20,673 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.384e+01 1.022e+02 1.131e+02 1.477e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 14:22:25,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=266853.3333333333, ans=0.2 2023-11-18 14:22:38,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=266920.0, ans=0.0 2023-11-18 14:22:41,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=266920.0, ans=0.125 2023-11-18 14:22:41,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=266986.6666666667, ans=0.0 2023-11-18 14:23:02,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=267053.3333333333, ans=0.1 2023-11-18 14:23:03,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=267053.3333333333, ans=0.0 2023-11-18 14:23:05,088 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4000, loss[loss=0.118, simple_loss=0.1253, pruned_loss=0.042, audio_tagging_loss=0.01339, over 15261.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1223, pruned_loss=0.03655, audio_tagging_loss=0.01212, over 3037399.03 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:23:13,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=267120.0, ans=0.0 2023-11-18 14:23:44,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=267320.0, ans=0.95 2023-11-18 14:23:55,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=267386.6666666667, ans=0.0 2023-11-18 14:24:01,221 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4050, loss[loss=0.08528, simple_loss=0.09115, pruned_loss=0.02661, audio_tagging_loss=0.0131, over 14460.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1223, pruned_loss=0.03667, audio_tagging_loss=0.01222, over 3041198.15 frames. ], batch size: 55, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:24:03,460 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:24:12,550 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 9.604e+01 1.092e+02 1.269e+02 1.663e+02, threshold=2.185e+02, percent-clipped=0.0 2023-11-18 14:24:15,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267520.0, ans=0.1 2023-11-18 14:24:27,695 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:24:34,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=267653.3333333333, ans=0.2 2023-11-18 14:24:42,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=267653.3333333333, ans=0.0 2023-11-18 14:24:57,451 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4100, loss[loss=0.1294, simple_loss=0.154, pruned_loss=0.04244, audio_tagging_loss=0.009946, over 15590.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1218, pruned_loss=0.03633, audio_tagging_loss=0.01215, over 3043428.25 frames. ], batch size: 57, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:25:15,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=267853.3333333333, ans=0.0 2023-11-18 14:25:18,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=267920.0, ans=0.125 2023-11-18 14:25:33,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=267986.6666666667, ans=0.95 2023-11-18 14:25:34,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.27 vs. limit=10.0 2023-11-18 14:25:41,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.70 vs. limit=10.0 2023-11-18 14:25:49,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=268053.3333333333, ans=0.125 2023-11-18 14:25:53,605 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4150, loss[loss=0.1095, simple_loss=0.12, pruned_loss=0.03964, audio_tagging_loss=0.00986, over 13688.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1218, pruned_loss=0.03642, audio_tagging_loss=0.01196, over 3038950.85 frames. ], batch size: 52, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:25:53,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=268120.0, ans=0.125 2023-11-18 14:25:55,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2023-11-18 14:25:59,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=268120.0, ans=0.125 2023-11-18 14:26:04,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 9.487e+01 1.055e+02 1.166e+02 1.501e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 14:26:04,830 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2023-11-18 14:26:21,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=268253.3333333333, ans=0.2 2023-11-18 14:26:31,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=268320.0, ans=0.125 2023-11-18 14:26:31,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=268320.0, ans=0.0 2023-11-18 14:26:33,424 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:26:35,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=268320.0, ans=0.04949747468305833 2023-11-18 14:26:48,311 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4200, loss[loss=0.0977, simple_loss=0.1165, pruned_loss=0.02949, audio_tagging_loss=0.009966, over 15237.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.1212, pruned_loss=0.03624, audio_tagging_loss=0.01183, over 3040281.76 frames. ], batch size: 56, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:27:00,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-11-18 14:27:14,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=268586.6666666667, ans=0.0 2023-11-18 14:27:15,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=268586.6666666667, ans=0.0 2023-11-18 14:27:30,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=268653.3333333333, ans=0.2 2023-11-18 14:27:44,856 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4250, loss[loss=0.09408, simple_loss=0.1117, pruned_loss=0.0284, audio_tagging_loss=0.009821, over 14283.00 frames. ], tot_loss[loss=0.1108, simple_loss=0.1239, pruned_loss=0.03722, audio_tagging_loss=0.01169, over 3045911.80 frames. ], batch size: 53, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:27:45,381 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-18 14:27:49,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=268786.6666666667, ans=0.0 2023-11-18 14:27:54,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2023-11-18 14:27:57,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 9.513e+01 1.037e+02 1.128e+02 1.811e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 14:28:02,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=268853.3333333333, ans=0.0 2023-11-18 14:28:07,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=268920.0, ans=0.0 2023-11-18 14:28:08,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.28 vs. limit=15.0 2023-11-18 14:28:37,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=269053.3333333333, ans=0.125 2023-11-18 14:28:38,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=269053.3333333333, ans=0.125 2023-11-18 14:28:40,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-18 14:28:41,080 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4300, loss[loss=0.1363, simple_loss=0.1563, pruned_loss=0.0468, audio_tagging_loss=0.01133, over 16532.00 frames. ], tot_loss[loss=0.111, simple_loss=0.124, pruned_loss=0.03737, audio_tagging_loss=0.01167, over 3045675.25 frames. ], batch size: 59, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:28:45,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=269120.0, ans=0.0 2023-11-18 14:28:47,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=269120.0, ans=0.125 2023-11-18 14:28:58,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=269186.6666666667, ans=0.035 2023-11-18 14:28:58,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=15.0 2023-11-18 14:29:30,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=269386.6666666667, ans=0.125 2023-11-18 14:29:33,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=269386.6666666667, ans=0.125 2023-11-18 14:29:36,919 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4350, loss[loss=0.09682, simple_loss=0.09933, pruned_loss=0.03411, audio_tagging_loss=0.01304, over 15132.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1236, pruned_loss=0.03699, audio_tagging_loss=0.01161, over 3043350.68 frames. ], batch size: 61, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:29:48,987 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 1.042e+02 1.123e+02 1.311e+02 1.927e+02, threshold=2.246e+02, percent-clipped=0.0 2023-11-18 14:29:49,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269520.0, ans=0.1 2023-11-18 14:29:50,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=269520.0, ans=0.0 2023-11-18 14:29:55,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=269520.0, ans=0.1 2023-11-18 14:30:27,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2023-11-18 14:30:27,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=12.0 2023-11-18 14:30:31,880 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4400, loss[loss=0.1006, simple_loss=0.1196, pruned_loss=0.03025, audio_tagging_loss=0.01052, over 15608.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1235, pruned_loss=0.03676, audio_tagging_loss=0.01163, over 3048238.53 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:30:43,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=269853.3333333333, ans=0.2 2023-11-18 14:30:45,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=269853.3333333333, ans=0.05 2023-11-18 14:30:57,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=269920.0, ans=0.125 2023-11-18 14:30:59,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=269920.0, ans=0.125 2023-11-18 14:31:00,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-11-18 14:31:06,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=15.0 2023-11-18 14:31:28,576 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4450, loss[loss=0.1203, simple_loss=0.1373, pruned_loss=0.04139, audio_tagging_loss=0.01032, over 15484.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.123, pruned_loss=0.03638, audio_tagging_loss=0.01143, over 3051341.57 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:31:40,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 9.826e+01 1.064e+02 1.191e+02 1.732e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 14:32:10,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=270320.0, ans=0.125 2023-11-18 14:32:13,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=270386.6666666667, ans=0.125 2023-11-18 14:32:22,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-11-18 14:32:23,830 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4500, loss[loss=0.07581, simple_loss=0.08617, pruned_loss=0.02164, audio_tagging_loss=0.01109, over 14425.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1228, pruned_loss=0.0362, audio_tagging_loss=0.01154, over 3052916.00 frames. ], batch size: 55, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:32:51,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270586.6666666667, ans=0.1 2023-11-18 14:33:00,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-11-18 14:33:20,080 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4550, loss[loss=0.138, simple_loss=0.1621, pruned_loss=0.04375, audio_tagging_loss=0.01322, over 15927.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1235, pruned_loss=0.03666, audio_tagging_loss=0.01154, over 3054594.57 frames. ], batch size: 56, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:33:21,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=270786.6666666667, ans=0.125 2023-11-18 14:33:28,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=12.0 2023-11-18 14:33:33,252 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.598e+01 1.091e+02 1.194e+02 2.832e+02, threshold=2.183e+02, percent-clipped=1.0 2023-11-18 14:33:49,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=270920.0, ans=0.125 2023-11-18 14:33:51,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=270920.0, ans=0.125 2023-11-18 14:34:02,581 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:34:04,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=271053.3333333333, ans=0.125 2023-11-18 14:34:17,072 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4600, loss[loss=0.1246, simple_loss=0.1373, pruned_loss=0.04212, audio_tagging_loss=0.01385, over 15866.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1229, pruned_loss=0.03675, audio_tagging_loss=0.01171, over 3052267.30 frames. ], batch size: 60, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:34:17,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2023-11-18 14:34:19,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=271120.0, ans=0.125 2023-11-18 14:34:52,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=271320.0, ans=0.1 2023-11-18 14:35:12,026 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4650, loss[loss=0.08542, simple_loss=0.09665, pruned_loss=0.02323, audio_tagging_loss=0.01387, over 16581.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1216, pruned_loss=0.03632, audio_tagging_loss=0.01176, over 3054735.58 frames. ], batch size: 63, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:35:24,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 1.020e+02 1.140e+02 1.306e+02 2.124e+02, threshold=2.280e+02, percent-clipped=0.0 2023-11-18 14:35:24,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=271520.0, ans=0.0 2023-11-18 14:35:45,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=271653.3333333333, ans=0.1 2023-11-18 14:35:45,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2023-11-18 14:36:00,038 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:36:07,759 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4700, loss[loss=0.1387, simple_loss=0.163, pruned_loss=0.04785, audio_tagging_loss=0.009306, over 15529.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.122, pruned_loss=0.03646, audio_tagging_loss=0.01186, over 3057014.98 frames. ], batch size: 54, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:36:11,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=271786.6666666667, ans=0.125 2023-11-18 14:36:32,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=271920.0, ans=0.0 2023-11-18 14:36:35,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=271920.0, ans=0.0 2023-11-18 14:36:38,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2023-11-18 14:36:40,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2023-11-18 14:36:42,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=271986.6666666667, ans=0.1 2023-11-18 14:36:43,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=271986.6666666667, ans=0.1 2023-11-18 14:36:51,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=272053.3333333333, ans=0.2 2023-11-18 14:36:55,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=272053.3333333333, ans=0.125 2023-11-18 14:37:01,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-11-18 14:37:04,180 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4750, loss[loss=0.0848, simple_loss=0.09439, pruned_loss=0.02796, audio_tagging_loss=0.009654, over 14803.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1218, pruned_loss=0.03647, audio_tagging_loss=0.01189, over 3049302.03 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:37:05,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=272120.0, ans=0.125 2023-11-18 14:37:06,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=272120.0, ans=0.0 2023-11-18 14:37:08,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=272120.0, ans=0.125 2023-11-18 14:37:14,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=272186.6666666667, ans=0.0 2023-11-18 14:37:16,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 9.592e+01 1.080e+02 1.195e+02 1.652e+02, threshold=2.159e+02, percent-clipped=0.0 2023-11-18 14:37:20,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=272186.6666666667, ans=0.125 2023-11-18 14:37:23,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=272186.6666666667, ans=0.125 2023-11-18 14:37:38,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-18 14:37:47,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=272386.6666666667, ans=0.04949747468305833 2023-11-18 14:37:59,706 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4800, loss[loss=0.08266, simple_loss=0.08157, pruned_loss=0.02682, audio_tagging_loss=0.01506, over 15008.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1222, pruned_loss=0.03667, audio_tagging_loss=0.01202, over 3051955.58 frames. ], batch size: 58, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:38:03,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=272453.3333333333, ans=0.125 2023-11-18 14:38:18,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272520.0, ans=0.0 2023-11-18 14:38:27,017 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.516e-03 2023-11-18 14:38:27,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=272586.6666666667, ans=0.5 2023-11-18 14:38:30,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2023-11-18 14:38:33,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=272653.3333333333, ans=0.0 2023-11-18 14:38:42,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=272653.3333333333, ans=0.125 2023-11-18 14:38:52,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=272720.0, ans=0.125 2023-11-18 14:38:53,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2023-11-18 14:38:55,231 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4850, loss[loss=0.1099, simple_loss=0.121, pruned_loss=0.03683, audio_tagging_loss=0.01257, over 14048.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.123, pruned_loss=0.03695, audio_tagging_loss=0.01212, over 3051961.29 frames. ], batch size: 54, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:39:00,711 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:39:07,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=272853.3333333333, ans=0.0 2023-11-18 14:39:07,952 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 9.439e+01 1.075e+02 1.233e+02 2.240e+02, threshold=2.150e+02, percent-clipped=1.0 2023-11-18 14:39:09,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=272853.3333333333, ans=0.125 2023-11-18 14:39:19,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=272920.0, ans=0.0 2023-11-18 14:39:34,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=272986.6666666667, ans=0.0 2023-11-18 14:39:37,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=272986.6666666667, ans=0.125 2023-11-18 14:39:43,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=273053.3333333333, ans=0.5 2023-11-18 14:39:50,686 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-18 14:39:51,330 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4900, loss[loss=0.1276, simple_loss=0.1557, pruned_loss=0.04, audio_tagging_loss=0.009753, over 15773.00 frames. ], tot_loss[loss=0.1096, simple_loss=0.1221, pruned_loss=0.03654, audio_tagging_loss=0.012, over 3041072.42 frames. ], batch size: 54, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:40:04,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-11-18 14:40:05,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2023-11-18 14:40:22,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=273253.3333333333, ans=0.0 2023-11-18 14:40:31,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-18 14:40:46,585 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 4950, loss[loss=0.1111, simple_loss=0.1321, pruned_loss=0.03497, audio_tagging_loss=0.01009, over 15660.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1218, pruned_loss=0.03634, audio_tagging_loss=0.01179, over 3037344.17 frames. ], batch size: 56, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:40:58,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2023-11-18 14:40:58,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.503e+01 1.074e+02 1.226e+02 1.825e+02, threshold=2.148e+02, percent-clipped=0.0 2023-11-18 14:41:16,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-11-18 14:41:17,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=273586.6666666667, ans=0.125 2023-11-18 14:41:27,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=273653.3333333333, ans=0.1 2023-11-18 14:41:42,363 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5000, loss[loss=0.1259, simple_loss=0.1328, pruned_loss=0.05042, audio_tagging_loss=0.009067, over 15732.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1222, pruned_loss=0.0365, audio_tagging_loss=0.01159, over 3039342.34 frames. ], batch size: 59, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:41:46,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=273786.6666666667, ans=0.0 2023-11-18 14:41:57,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=273853.3333333333, ans=10.0 2023-11-18 14:41:58,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=273853.3333333333, ans=0.04949747468305833 2023-11-18 14:42:05,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=273920.0, ans=0.125 2023-11-18 14:42:12,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=273920.0, ans=0.125 2023-11-18 14:42:38,351 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5050, loss[loss=0.1375, simple_loss=0.1615, pruned_loss=0.04734, audio_tagging_loss=0.009353, over 14820.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1226, pruned_loss=0.03655, audio_tagging_loss=0.01137, over 3038500.37 frames. ], batch size: 54, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:42:39,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274120.0, ans=0.125 2023-11-18 14:42:51,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 9.577e+01 1.097e+02 1.238e+02 1.791e+02, threshold=2.193e+02, percent-clipped=0.0 2023-11-18 14:43:03,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=274253.3333333333, ans=0.025 2023-11-18 14:43:19,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=274320.0, ans=0.95 2023-11-18 14:43:32,748 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5100, loss[loss=0.1005, simple_loss=0.1255, pruned_loss=0.02955, audio_tagging_loss=0.008167, over 13578.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1208, pruned_loss=0.0362, audio_tagging_loss=0.01142, over 3032936.76 frames. ], batch size: 55, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:43:39,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=274453.3333333333, ans=0.125 2023-11-18 14:43:41,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=274453.3333333333, ans=0.0 2023-11-18 14:43:42,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274520.0, ans=0.125 2023-11-18 14:43:51,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=274520.0, ans=0.125 2023-11-18 14:44:15,262 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=7.817e-02 2023-11-18 14:44:15,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=274653.3333333333, ans=0.125 2023-11-18 14:44:27,775 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5150, loss[loss=0.1068, simple_loss=0.127, pruned_loss=0.0322, audio_tagging_loss=0.0111, over 15337.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1208, pruned_loss=0.03602, audio_tagging_loss=0.01141, over 3041018.77 frames. ], batch size: 56, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:44:41,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 9.655e+01 1.078e+02 1.221e+02 1.622e+02, threshold=2.156e+02, percent-clipped=0.0 2023-11-18 14:45:00,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=274986.6666666667, ans=0.0 2023-11-18 14:45:23,358 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5200, loss[loss=0.09743, simple_loss=0.1138, pruned_loss=0.03235, audio_tagging_loss=0.008172, over 15726.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1213, pruned_loss=0.03616, audio_tagging_loss=0.01146, over 3035684.66 frames. ], batch size: 59, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:45:36,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=275186.6666666667, ans=0.0 2023-11-18 14:45:39,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275186.6666666667, ans=0.1 2023-11-18 14:45:42,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=275186.6666666667, ans=0.125 2023-11-18 14:46:16,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=275386.6666666667, ans=0.0 2023-11-18 14:46:18,240 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5250, loss[loss=0.1014, simple_loss=0.1037, pruned_loss=0.03611, audio_tagging_loss=0.01342, over 14876.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1226, pruned_loss=0.0367, audio_tagging_loss=0.01141, over 3035090.17 frames. ], batch size: 58, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:46:30,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.969e+01 9.429e+01 1.029e+02 1.136e+02 1.567e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 14:46:41,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=275586.6666666667, ans=0.125 2023-11-18 14:46:52,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=275653.3333333333, ans=0.125 2023-11-18 14:46:53,386 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:47:02,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.96 vs. limit=22.5 2023-11-18 14:47:12,125 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5300, loss[loss=0.08719, simple_loss=0.09073, pruned_loss=0.03141, audio_tagging_loss=0.01041, over 15023.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1234, pruned_loss=0.03678, audio_tagging_loss=0.01142, over 3032667.59 frames. ], batch size: 57, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:47:27,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=275853.3333333333, ans=0.0 2023-11-18 14:47:38,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=275920.0, ans=0.125 2023-11-18 14:48:01,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=276053.3333333333, ans=0.0 2023-11-18 14:48:07,983 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5350, loss[loss=0.08775, simple_loss=0.1005, pruned_loss=0.02703, audio_tagging_loss=0.01046, over 14326.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1222, pruned_loss=0.03638, audio_tagging_loss=0.01161, over 3030440.14 frames. ], batch size: 55, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:48:14,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=276120.0, ans=0.125 2023-11-18 14:48:21,232 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.718e+01 1.034e+02 1.191e+02 1.805e+02, threshold=2.068e+02, percent-clipped=0.0 2023-11-18 14:48:34,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=276253.3333333333, ans=0.1 2023-11-18 14:48:39,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=276320.0, ans=0.0 2023-11-18 14:48:51,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=276386.6666666667, ans=0.07 2023-11-18 14:49:03,117 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5400, loss[loss=0.1101, simple_loss=0.1257, pruned_loss=0.0346, audio_tagging_loss=0.01269, over 14812.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1231, pruned_loss=0.03659, audio_tagging_loss=0.0116, over 3031163.56 frames. ], batch size: 53, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:49:03,808 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2023-11-18 14:49:18,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=276520.0, ans=0.125 2023-11-18 14:49:23,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=276586.6666666667, ans=0.125 2023-11-18 14:49:27,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=276586.6666666667, ans=0.125 2023-11-18 14:49:29,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=276586.6666666667, ans=0.125 2023-11-18 14:49:57,655 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5450, loss[loss=0.1041, simple_loss=0.1148, pruned_loss=0.03636, audio_tagging_loss=0.01039, over 14841.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1219, pruned_loss=0.03612, audio_tagging_loss=0.01179, over 3036220.15 frames. ], batch size: 55, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:50:10,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 9.674e+01 1.094e+02 1.267e+02 1.723e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 14:50:15,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-11-18 14:50:22,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=276920.0, ans=0.125 2023-11-18 14:50:30,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=276986.6666666667, ans=0.2 2023-11-18 14:50:37,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2023-11-18 14:50:48,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=277053.3333333333, ans=0.09899494936611666 2023-11-18 14:50:50,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2023-11-18 14:50:52,387 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5500, loss[loss=0.1487, simple_loss=0.1663, pruned_loss=0.05575, audio_tagging_loss=0.00977, over 15606.00 frames. ], tot_loss[loss=0.1096, simple_loss=0.1226, pruned_loss=0.03652, audio_tagging_loss=0.01179, over 3039855.74 frames. ], batch size: 58, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:51:14,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=277253.3333333333, ans=0.2 2023-11-18 14:51:21,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=277253.3333333333, ans=0.0 2023-11-18 14:51:25,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.59 vs. limit=15.0 2023-11-18 14:51:47,652 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5550, loss[loss=0.07446, simple_loss=0.07333, pruned_loss=0.02309, audio_tagging_loss=0.0147, over 13874.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1237, pruned_loss=0.03685, audio_tagging_loss=0.01182, over 3049225.93 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:52:00,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 9.567e+01 1.041e+02 1.171e+02 1.468e+02, threshold=2.082e+02, percent-clipped=0.0 2023-11-18 14:52:14,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=277586.6666666667, ans=0.0 2023-11-18 14:52:17,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=277586.6666666667, ans=0.2 2023-11-18 14:52:23,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2023-11-18 14:52:29,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=277653.3333333333, ans=0.1 2023-11-18 14:52:31,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=277720.0, ans=0.07 2023-11-18 14:52:32,747 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.556e-02 2023-11-18 14:52:35,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277720.0, ans=0.1 2023-11-18 14:52:41,984 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5600, loss[loss=0.1457, simple_loss=0.1624, pruned_loss=0.05485, audio_tagging_loss=0.009666, over 15921.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1227, pruned_loss=0.03655, audio_tagging_loss=0.012, over 3050593.97 frames. ], batch size: 58, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:52:50,659 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:52:54,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.20 vs. limit=22.5 2023-11-18 14:52:55,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2023-11-18 14:53:19,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=277986.6666666667, ans=0.0 2023-11-18 14:53:21,532 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:53:29,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2023-11-18 14:53:33,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=278053.3333333333, ans=0.0 2023-11-18 14:53:35,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=278120.0, ans=0.125 2023-11-18 14:53:36,755 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5650, loss[loss=0.1247, simple_loss=0.1342, pruned_loss=0.04387, audio_tagging_loss=0.01374, over 15534.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1221, pruned_loss=0.03626, audio_tagging_loss=0.01205, over 3048366.62 frames. ], batch size: 58, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:53:36,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=278120.0, ans=0.05 2023-11-18 14:53:40,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=278120.0, ans=0.125 2023-11-18 14:53:50,486 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 9.354e+01 1.022e+02 1.173e+02 1.530e+02, threshold=2.043e+02, percent-clipped=0.0 2023-11-18 14:53:54,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=278186.6666666667, ans=0.0 2023-11-18 14:53:55,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.73 vs. limit=22.5 2023-11-18 14:53:57,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2023-11-18 14:54:05,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=278253.3333333333, ans=0.125 2023-11-18 14:54:11,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=278320.0, ans=0.2 2023-11-18 14:54:17,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=278320.0, ans=0.125 2023-11-18 14:54:20,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278386.6666666667, ans=0.1 2023-11-18 14:54:32,127 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5700, loss[loss=0.09041, simple_loss=0.1011, pruned_loss=0.02969, audio_tagging_loss=0.01018, over 16010.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1211, pruned_loss=0.03579, audio_tagging_loss=0.01198, over 3039638.84 frames. ], batch size: 64, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:54:47,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=278520.0, ans=0.125 2023-11-18 14:54:49,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=278520.0, ans=0.2 2023-11-18 14:54:53,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=278586.6666666667, ans=0.0 2023-11-18 14:54:57,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=278586.6666666667, ans=0.125 2023-11-18 14:55:16,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2023-11-18 14:55:19,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.34 vs. limit=10.0 2023-11-18 14:55:22,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=278720.0, ans=0.0 2023-11-18 14:55:24,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=278720.0, ans=0.0 2023-11-18 14:55:27,007 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5750, loss[loss=0.07612, simple_loss=0.08304, pruned_loss=0.02243, audio_tagging_loss=0.01218, over 13922.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1201, pruned_loss=0.03541, audio_tagging_loss=0.01179, over 3040365.03 frames. ], batch size: 54, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:55:40,247 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.668e+01 1.031e+02 1.141e+02 1.503e+02, threshold=2.062e+02, percent-clipped=0.0 2023-11-18 14:55:51,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=278920.0, ans=0.2 2023-11-18 14:55:55,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=278920.0, ans=0.125 2023-11-18 14:56:13,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.18 vs. limit=15.0 2023-11-18 14:56:17,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=279053.3333333333, ans=0.0 2023-11-18 14:56:22,412 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5800, loss[loss=0.07837, simple_loss=0.08437, pruned_loss=0.02152, audio_tagging_loss=0.01466, over 14311.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1209, pruned_loss=0.0358, audio_tagging_loss=0.01181, over 3049950.53 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:56:28,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=279120.0, ans=0.125 2023-11-18 14:56:29,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2023-11-18 14:57:02,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=279320.0, ans=0.0 2023-11-18 14:57:18,279 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5850, loss[loss=0.101, simple_loss=0.121, pruned_loss=0.03182, audio_tagging_loss=0.008718, over 15112.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.119, pruned_loss=0.03528, audio_tagging_loss=0.01186, over 3050823.91 frames. ], batch size: 57, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:57:31,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.015e+01 9.648e+01 1.054e+02 1.215e+02 1.872e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 14:57:41,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=279586.6666666667, ans=0.125 2023-11-18 14:57:52,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=279653.3333333333, ans=0.1 2023-11-18 14:58:07,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=279720.0, ans=0.1 2023-11-18 14:58:13,667 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5900, loss[loss=0.1081, simple_loss=0.1182, pruned_loss=0.0364, audio_tagging_loss=0.01261, over 16272.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.12, pruned_loss=0.03534, audio_tagging_loss=0.0118, over 3055793.25 frames. ], batch size: 60, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 14:58:33,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.67 vs. limit=12.0 2023-11-18 14:58:39,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=279920.0, ans=0.125 2023-11-18 14:58:43,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=279920.0, ans=0.125 2023-11-18 14:58:44,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=279920.0, ans=0.0 2023-11-18 14:58:56,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=279986.6666666667, ans=0.125 2023-11-18 14:58:57,398 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:59:08,889 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 5950, loss[loss=0.0885, simple_loss=0.1022, pruned_loss=0.02791, audio_tagging_loss=0.009494, over 14739.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1197, pruned_loss=0.03491, audio_tagging_loss=0.01173, over 3048966.25 frames. ], batch size: 54, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 14:59:22,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=280186.6666666667, ans=0.2 2023-11-18 14:59:23,196 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 1.041e+02 1.163e+02 1.306e+02 1.742e+02, threshold=2.325e+02, percent-clipped=0.0 2023-11-18 14:59:29,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280186.6666666667, ans=0.125 2023-11-18 14:59:42,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=280320.0, ans=0.125 2023-11-18 14:59:44,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=280320.0, ans=0.0 2023-11-18 14:59:50,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=280320.0, ans=0.0 2023-11-18 15:00:05,302 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6000, loss[loss=0.1199, simple_loss=0.1317, pruned_loss=0.04211, audio_tagging_loss=0.01192, over 15885.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1209, pruned_loss=0.03555, audio_tagging_loss=0.0117, over 3047851.43 frames. ], batch size: 58, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:00:05,303 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 15:00:38,403 INFO [train_asr.py:1147] (3/4) Epoch 4, validation: loss=0.07584, simple_loss=0.06235, pruned_loss=0.0102, audio_tagging_loss=0.03446, over 4681554.00 frames. 2023-11-18 15:00:38,404 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 15:00:39,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=280453.3333333333, ans=0.125 2023-11-18 15:00:48,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=280520.0, ans=15.0 2023-11-18 15:01:00,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-18 15:01:18,964 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:01:33,797 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6050, loss[loss=0.1113, simple_loss=0.1254, pruned_loss=0.03955, audio_tagging_loss=0.008996, over 15582.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.1222, pruned_loss=0.03604, audio_tagging_loss=0.01158, over 3043092.95 frames. ], batch size: 58, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:01:35,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2023-11-18 15:01:47,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.202e+01 9.320e+01 1.035e+02 1.195e+02 1.658e+02, threshold=2.071e+02, percent-clipped=0.0 2023-11-18 15:01:49,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=280853.3333333333, ans=0.0 2023-11-18 15:02:01,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=280920.0, ans=0.2 2023-11-18 15:02:06,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=280986.6666666667, ans=0.125 2023-11-18 15:02:19,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=281053.3333333333, ans=0.2 2023-11-18 15:02:29,895 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6100, loss[loss=0.1152, simple_loss=0.1323, pruned_loss=0.03769, audio_tagging_loss=0.0114, over 15488.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1206, pruned_loss=0.03558, audio_tagging_loss=0.0116, over 3046098.22 frames. ], batch size: 58, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:02:43,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=281186.6666666667, ans=0.1 2023-11-18 15:03:04,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=281320.0, ans=0.125 2023-11-18 15:03:08,039 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:03:12,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=281320.0, ans=0.125 2023-11-18 15:03:13,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=281386.6666666667, ans=0.125 2023-11-18 15:03:15,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=281386.6666666667, ans=0.125 2023-11-18 15:03:19,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=281386.6666666667, ans=0.2 2023-11-18 15:03:20,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=281386.6666666667, ans=0.0 2023-11-18 15:03:24,794 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6150, loss[loss=0.08826, simple_loss=0.1028, pruned_loss=0.02538, audio_tagging_loss=0.0115, over 14845.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1207, pruned_loss=0.03582, audio_tagging_loss=0.01164, over 3045411.84 frames. ], batch size: 55, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:03:25,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.38 vs. limit=15.0 2023-11-18 15:03:38,017 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 9.712e+01 1.096e+02 1.258e+02 1.781e+02, threshold=2.192e+02, percent-clipped=0.0 2023-11-18 15:03:53,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.57 vs. limit=8.0 2023-11-18 15:03:58,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=281653.3333333333, ans=0.05 2023-11-18 15:04:17,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-11-18 15:04:18,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=281720.0, ans=0.07 2023-11-18 15:04:18,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=281720.0, ans=0.1 2023-11-18 15:04:20,406 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6200, loss[loss=0.1353, simple_loss=0.1572, pruned_loss=0.04703, audio_tagging_loss=0.009626, over 14834.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.121, pruned_loss=0.03579, audio_tagging_loss=0.0118, over 3044905.16 frames. ], batch size: 54, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:04:41,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=281853.3333333333, ans=0.0 2023-11-18 15:04:54,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=281986.6666666667, ans=0.125 2023-11-18 15:05:17,031 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6250, loss[loss=0.1388, simple_loss=0.1657, pruned_loss=0.04813, audio_tagging_loss=0.007792, over 15828.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1211, pruned_loss=0.03601, audio_tagging_loss=0.01184, over 3043205.70 frames. ], batch size: 56, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:05:29,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 9.446e+01 1.080e+02 1.226e+02 1.932e+02, threshold=2.161e+02, percent-clipped=0.0 2023-11-18 15:05:44,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=282253.3333333333, ans=0.1 2023-11-18 15:05:56,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.47 vs. limit=15.0 2023-11-18 15:06:11,964 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6300, loss[loss=0.09746, simple_loss=0.1092, pruned_loss=0.0349, audio_tagging_loss=0.007971, over 16033.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.121, pruned_loss=0.03589, audio_tagging_loss=0.01192, over 3041717.27 frames. ], batch size: 60, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:06:17,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=282453.3333333333, ans=0.125 2023-11-18 15:06:55,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=282720.0, ans=0.0 2023-11-18 15:07:07,479 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6350, loss[loss=0.1043, simple_loss=0.1263, pruned_loss=0.02831, audio_tagging_loss=0.01285, over 16190.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1219, pruned_loss=0.03624, audio_tagging_loss=0.01192, over 3041369.72 frames. ], batch size: 61, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:07:10,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2023-11-18 15:07:13,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.39 vs. limit=10.0 2023-11-18 15:07:21,674 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.648e+01 1.090e+02 1.229e+02 1.753e+02, threshold=2.179e+02, percent-clipped=0.0 2023-11-18 15:07:31,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=282920.0, ans=0.1 2023-11-18 15:07:54,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=283053.3333333333, ans=0.125 2023-11-18 15:07:54,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=283053.3333333333, ans=0.2 2023-11-18 15:08:03,799 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6400, loss[loss=0.1023, simple_loss=0.1092, pruned_loss=0.03321, audio_tagging_loss=0.01451, over 13933.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1212, pruned_loss=0.03595, audio_tagging_loss=0.01199, over 3043750.67 frames. ], batch size: 54, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:08:09,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=283120.0, ans=0.2 2023-11-18 15:08:09,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.51 vs. limit=22.5 2023-11-18 15:08:28,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2023-11-18 15:08:33,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=283253.3333333333, ans=0.0 2023-11-18 15:08:38,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=283320.0, ans=0.125 2023-11-18 15:08:40,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=283320.0, ans=0.05 2023-11-18 15:08:58,500 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6450, loss[loss=0.1136, simple_loss=0.1198, pruned_loss=0.03901, audio_tagging_loss=0.01471, over 15346.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1212, pruned_loss=0.03593, audio_tagging_loss=0.01207, over 3036231.66 frames. ], batch size: 59, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:09:02,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=15.0 2023-11-18 15:09:11,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 9.197e+01 1.014e+02 1.179e+02 1.440e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 15:09:36,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=283653.3333333333, ans=0.0 2023-11-18 15:09:53,326 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6500, loss[loss=0.1236, simple_loss=0.1346, pruned_loss=0.04442, audio_tagging_loss=0.01187, over 14147.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.123, pruned_loss=0.03647, audio_tagging_loss=0.01196, over 3037522.84 frames. ], batch size: 53, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:10:04,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283853.3333333333, ans=0.1 2023-11-18 15:10:10,573 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.67 vs. limit=15.0 2023-11-18 15:10:24,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=283920.0, ans=0.125 2023-11-18 15:10:24,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=283920.0, ans=0.2 2023-11-18 15:10:34,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283986.6666666667, ans=0.1 2023-11-18 15:10:48,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=284053.3333333333, ans=0.125 2023-11-18 15:10:49,937 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6550, loss[loss=0.1272, simple_loss=0.1458, pruned_loss=0.044, audio_tagging_loss=0.01032, over 15409.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1232, pruned_loss=0.0364, audio_tagging_loss=0.01179, over 3040531.84 frames. ], batch size: 55, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:10:55,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2023-11-18 15:11:01,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2023-11-18 15:11:03,059 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.628e+01 1.072e+02 1.195e+02 1.710e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 15:11:36,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=284386.6666666667, ans=0.0 2023-11-18 15:11:45,570 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6600, loss[loss=0.137, simple_loss=0.162, pruned_loss=0.04756, audio_tagging_loss=0.008476, over 15921.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.123, pruned_loss=0.03651, audio_tagging_loss=0.01176, over 3045022.12 frames. ], batch size: 58, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:11:46,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.74 vs. limit=22.5 2023-11-18 15:11:58,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-11-18 15:12:19,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=284653.3333333333, ans=0.0 2023-11-18 15:12:35,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=284720.0, ans=0.0 2023-11-18 15:12:36,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=284720.0, ans=0.2 2023-11-18 15:12:37,602 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:12:40,475 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6650, loss[loss=0.07903, simple_loss=0.08891, pruned_loss=0.02134, audio_tagging_loss=0.01324, over 16644.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.1227, pruned_loss=0.03634, audio_tagging_loss=0.01179, over 3048839.40 frames. ], batch size: 63, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:12:50,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=284853.3333333333, ans=0.125 2023-11-18 15:12:54,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.970e+01 9.511e+01 1.065e+02 1.198e+02 1.619e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 15:12:54,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=284853.3333333333, ans=0.125 2023-11-18 15:12:55,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=284853.3333333333, ans=0.2 2023-11-18 15:12:56,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284853.3333333333, ans=0.1 2023-11-18 15:12:58,505 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2023-11-18 15:13:04,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=284920.0, ans=0.1 2023-11-18 15:13:14,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2023-11-18 15:13:24,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=285053.3333333333, ans=0.125 2023-11-18 15:13:30,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=285053.3333333333, ans=0.125 2023-11-18 15:13:36,291 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6700, loss[loss=0.08862, simple_loss=0.1013, pruned_loss=0.0255, audio_tagging_loss=0.01249, over 15011.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1214, pruned_loss=0.03578, audio_tagging_loss=0.0118, over 3042857.55 frames. ], batch size: 58, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:13:47,122 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:13:58,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.28 vs. limit=10.0 2023-11-18 15:14:29,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-18 15:14:32,996 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6750, loss[loss=0.1232, simple_loss=0.1406, pruned_loss=0.04213, audio_tagging_loss=0.01079, over 15212.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1204, pruned_loss=0.03533, audio_tagging_loss=0.01188, over 3034958.06 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:14:39,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=285453.3333333333, ans=0.125 2023-11-18 15:14:44,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=285520.0, ans=0.1 2023-11-18 15:14:45,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 9.541e+01 1.044e+02 1.172e+02 1.686e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 15:14:53,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=285586.6666666667, ans=0.125 2023-11-18 15:15:06,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-11-18 15:15:12,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=285653.3333333333, ans=0.0 2023-11-18 15:15:21,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.whiten.whitening_limit, batch_count=285720.0, ans=12.0 2023-11-18 15:15:28,142 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6800, loss[loss=0.09914, simple_loss=0.1136, pruned_loss=0.03113, audio_tagging_loss=0.0112, over 14827.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1218, pruned_loss=0.03579, audio_tagging_loss=0.01169, over 3033886.05 frames. ], batch size: 56, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:15:39,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2023-11-18 15:15:43,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-18 15:15:51,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=285920.0, ans=0.125 2023-11-18 15:15:54,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285920.0, ans=0.1 2023-11-18 15:15:57,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=285920.0, ans=0.09899494936611666 2023-11-18 15:16:02,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=285986.6666666667, ans=0.1 2023-11-18 15:16:21,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=286053.3333333333, ans=0.025 2023-11-18 15:16:21,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=286053.3333333333, ans=0.125 2023-11-18 15:16:23,775 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6850, loss[loss=0.1048, simple_loss=0.1258, pruned_loss=0.03019, audio_tagging_loss=0.0117, over 15505.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1226, pruned_loss=0.03601, audio_tagging_loss=0.01146, over 3034349.90 frames. ], batch size: 58, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:16:24,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=286120.0, ans=0.0 2023-11-18 15:16:35,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=286186.6666666667, ans=0.0 2023-11-18 15:16:37,993 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 9.571e+01 1.055e+02 1.193e+02 1.601e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 15:16:45,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=286253.3333333333, ans=0.0 2023-11-18 15:17:20,142 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6900, loss[loss=0.1153, simple_loss=0.131, pruned_loss=0.04094, audio_tagging_loss=0.008893, over 14591.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.1224, pruned_loss=0.03586, audio_tagging_loss=0.01158, over 3035877.27 frames. ], batch size: 56, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:17:22,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=286453.3333333333, ans=0.1 2023-11-18 15:17:26,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=286453.3333333333, ans=0.0 2023-11-18 15:17:37,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=286520.0, ans=0.125 2023-11-18 15:18:04,992 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:18:12,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=286720.0, ans=0.0 2023-11-18 15:18:13,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=286720.0, ans=0.125 2023-11-18 15:18:15,638 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 6950, loss[loss=0.1321, simple_loss=0.1468, pruned_loss=0.04355, audio_tagging_loss=0.01515, over 15073.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1219, pruned_loss=0.03585, audio_tagging_loss=0.01174, over 3034523.81 frames. ], batch size: 55, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:18:15,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=286786.6666666667, ans=0.0 2023-11-18 15:18:25,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=286853.3333333333, ans=0.125 2023-11-18 15:18:28,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-11-18 15:18:28,743 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 9.398e+01 1.033e+02 1.158e+02 1.660e+02, threshold=2.066e+02, percent-clipped=0.0 2023-11-18 15:18:31,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=286853.3333333333, ans=0.125 2023-11-18 15:18:42,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=286920.0, ans=0.125 2023-11-18 15:18:44,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=286920.0, ans=0.1 2023-11-18 15:19:11,043 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7000, loss[loss=0.139, simple_loss=0.1552, pruned_loss=0.05065, audio_tagging_loss=0.0107, over 15368.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1211, pruned_loss=0.03551, audio_tagging_loss=0.01188, over 3040687.94 frames. ], batch size: 55, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:19:16,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=287120.0, ans=0.2 2023-11-18 15:19:25,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2023-11-18 15:19:36,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=287253.3333333333, ans=0.0 2023-11-18 15:19:46,084 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2023-11-18 15:19:56,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=287386.6666666667, ans=0.1 2023-11-18 15:20:00,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=287386.6666666667, ans=0.125 2023-11-18 15:20:04,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=287386.6666666667, ans=0.2 2023-11-18 15:20:05,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=287386.6666666667, ans=0.1 2023-11-18 15:20:07,123 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7050, loss[loss=0.08382, simple_loss=0.09115, pruned_loss=0.0248, audio_tagging_loss=0.01345, over 15323.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1206, pruned_loss=0.03542, audio_tagging_loss=0.01203, over 3025794.13 frames. ], batch size: 58, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:20:17,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287520.0, ans=0.1 2023-11-18 15:20:20,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.557e+01 1.044e+02 1.189e+02 1.971e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 15:20:30,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-11-18 15:20:59,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=287720.0, ans=0.0 2023-11-18 15:21:02,537 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7100, loss[loss=0.09665, simple_loss=0.1089, pruned_loss=0.02936, audio_tagging_loss=0.01285, over 15159.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1219, pruned_loss=0.03576, audio_tagging_loss=0.01184, over 3035643.12 frames. ], batch size: 57, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:21:24,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=287920.0, ans=0.125 2023-11-18 15:21:27,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=15.0 2023-11-18 15:21:55,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=288053.3333333333, ans=0.0 2023-11-18 15:21:57,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=288120.0, ans=0.125 2023-11-18 15:21:58,404 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7150, loss[loss=0.1224, simple_loss=0.1441, pruned_loss=0.03738, audio_tagging_loss=0.01299, over 15544.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1224, pruned_loss=0.03595, audio_tagging_loss=0.0119, over 3043521.64 frames. ], batch size: 56, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:21:58,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=288120.0, ans=0.125 2023-11-18 15:22:00,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2023-11-18 15:22:01,001 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-18 15:22:12,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 9.651e+01 1.094e+02 1.204e+02 1.585e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 15:22:20,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=288253.3333333333, ans=10.0 2023-11-18 15:22:38,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=288320.0, ans=0.125 2023-11-18 15:22:40,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=288320.0, ans=0.125 2023-11-18 15:22:42,247 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.09 vs. limit=15.0 2023-11-18 15:22:50,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=288386.6666666667, ans=0.2 2023-11-18 15:22:54,521 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7200, loss[loss=0.1192, simple_loss=0.139, pruned_loss=0.03739, audio_tagging_loss=0.0123, over 14543.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1224, pruned_loss=0.03603, audio_tagging_loss=0.01199, over 3047118.51 frames. ], batch size: 57, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:22:57,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=12.0 2023-11-18 15:23:04,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-11-18 15:23:25,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=288586.6666666667, ans=0.125 2023-11-18 15:23:49,855 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7250, loss[loss=0.1049, simple_loss=0.1198, pruned_loss=0.03319, audio_tagging_loss=0.01184, over 16094.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1232, pruned_loss=0.03618, audio_tagging_loss=0.01196, over 3047359.81 frames. ], batch size: 58, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:24:03,639 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.776e+01 1.072e+02 1.209e+02 1.575e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 15:24:25,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=288986.6666666667, ans=0.0 2023-11-18 15:24:34,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289053.3333333333, ans=0.1 2023-11-18 15:24:44,974 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7300, loss[loss=0.0934, simple_loss=0.09752, pruned_loss=0.03229, audio_tagging_loss=0.01235, over 15757.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1238, pruned_loss=0.03655, audio_tagging_loss=0.01172, over 3044179.87 frames. ], batch size: 62, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:24:51,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=289120.0, ans=0.125 2023-11-18 15:25:16,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2023-11-18 15:25:25,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-18 15:25:26,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=289320.0, ans=0.125 2023-11-18 15:25:28,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=289386.6666666667, ans=0.0 2023-11-18 15:25:29,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=289386.6666666667, ans=0.0 2023-11-18 15:25:40,809 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7350, loss[loss=0.04477, simple_loss=0.04141, pruned_loss=0.007034, audio_tagging_loss=0.01703, over 14518.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1227, pruned_loss=0.03634, audio_tagging_loss=0.01173, over 3046245.91 frames. ], batch size: 56, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:25:54,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 9.633e+01 1.075e+02 1.263e+02 1.928e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 15:25:54,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=289520.0, ans=0.125 2023-11-18 15:25:57,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=289520.0, ans=0.125 2023-11-18 15:26:14,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=289653.3333333333, ans=0.125 2023-11-18 15:26:22,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289653.3333333333, ans=0.1 2023-11-18 15:26:29,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=289720.0, ans=0.0 2023-11-18 15:26:30,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=289720.0, ans=0.125 2023-11-18 15:26:35,457 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7400, loss[loss=0.08501, simple_loss=0.09037, pruned_loss=0.02488, audio_tagging_loss=0.01494, over 14870.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1225, pruned_loss=0.03598, audio_tagging_loss=0.0116, over 3052433.63 frames. ], batch size: 56, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:26:40,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=289786.6666666667, ans=0.2 2023-11-18 15:27:05,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=289920.0, ans=0.125 2023-11-18 15:27:16,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=289986.6666666667, ans=0.125 2023-11-18 15:27:30,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=290120.0, ans=0.0 2023-11-18 15:27:30,964 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7450, loss[loss=0.07265, simple_loss=0.07853, pruned_loss=0.01864, audio_tagging_loss=0.01474, over 15168.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1221, pruned_loss=0.03591, audio_tagging_loss=0.01158, over 3052573.43 frames. ], batch size: 57, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:27:32,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290120.0, ans=0.1 2023-11-18 15:27:41,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=290186.6666666667, ans=0.2 2023-11-18 15:27:45,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-11-18 15:27:46,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.437e+01 1.026e+02 1.201e+02 2.000e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 15:27:53,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290253.3333333333, ans=0.1 2023-11-18 15:27:56,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=290253.3333333333, ans=0.1 2023-11-18 15:27:58,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2023-11-18 15:28:02,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=290253.3333333333, ans=0.04949747468305833 2023-11-18 15:28:07,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=290320.0, ans=0.125 2023-11-18 15:28:27,307 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7500, loss[loss=0.09623, simple_loss=0.1095, pruned_loss=0.02872, audio_tagging_loss=0.01274, over 15266.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1226, pruned_loss=0.03615, audio_tagging_loss=0.01146, over 3056755.19 frames. ], batch size: 56, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:28:35,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290453.3333333333, ans=0.1 2023-11-18 15:28:45,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=290520.0, ans=0.0 2023-11-18 15:28:52,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290586.6666666667, ans=0.1 2023-11-18 15:29:04,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=290653.3333333333, ans=0.125 2023-11-18 15:29:12,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=290720.0, ans=0.1 2023-11-18 15:29:21,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=290720.0, ans=15.0 2023-11-18 15:29:22,436 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7550, loss[loss=0.115, simple_loss=0.127, pruned_loss=0.0402, audio_tagging_loss=0.01126, over 15003.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1211, pruned_loss=0.03571, audio_tagging_loss=0.01139, over 3053687.90 frames. ], batch size: 56, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:29:32,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=290853.3333333333, ans=0.09899494936611666 2023-11-18 15:29:36,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.781e+01 9.490e+01 1.043e+02 1.208e+02 1.931e+02, threshold=2.087e+02, percent-clipped=0.0 2023-11-18 15:29:46,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=290920.0, ans=0.0 2023-11-18 15:29:55,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=290986.6666666667, ans=0.125 2023-11-18 15:30:09,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=291053.3333333333, ans=0.015 2023-11-18 15:30:12,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.80 vs. limit=10.0 2023-11-18 15:30:15,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-18 15:30:17,206 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7600, loss[loss=0.1099, simple_loss=0.131, pruned_loss=0.03557, audio_tagging_loss=0.0089, over 16012.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1204, pruned_loss=0.03542, audio_tagging_loss=0.01144, over 3050111.10 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:30:23,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=291120.0, ans=0.0 2023-11-18 15:30:26,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=291120.0, ans=0.2 2023-11-18 15:30:29,066 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:30:34,698 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.88 vs. limit=10.0 2023-11-18 15:31:12,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=291453.3333333333, ans=0.0 2023-11-18 15:31:13,071 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7650, loss[loss=0.1257, simple_loss=0.1376, pruned_loss=0.04523, audio_tagging_loss=0.0117, over 15495.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1227, pruned_loss=0.03613, audio_tagging_loss=0.01144, over 3047880.65 frames. ], batch size: 59, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:31:27,117 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.408e+01 1.037e+02 1.133e+02 1.442e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 15:31:31,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=291520.0, ans=0.0 2023-11-18 15:31:40,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=291586.6666666667, ans=0.125 2023-11-18 15:31:54,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=291653.3333333333, ans=0.125 2023-11-18 15:32:03,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=291720.0, ans=0.035 2023-11-18 15:32:08,456 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7700, loss[loss=0.113, simple_loss=0.1204, pruned_loss=0.04062, audio_tagging_loss=0.01223, over 15968.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1231, pruned_loss=0.03631, audio_tagging_loss=0.01147, over 3047901.82 frames. ], batch size: 59, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:32:12,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=291786.6666666667, ans=0.125 2023-11-18 15:32:17,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=291786.6666666667, ans=0.0 2023-11-18 15:32:32,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2023-11-18 15:32:33,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=291920.0, ans=0.0 2023-11-18 15:32:41,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2023-11-18 15:32:53,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=292053.3333333333, ans=0.09899494936611666 2023-11-18 15:33:03,725 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7750, loss[loss=0.1051, simple_loss=0.1051, pruned_loss=0.041, audio_tagging_loss=0.01153, over 15087.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1217, pruned_loss=0.03592, audio_tagging_loss=0.01161, over 3040799.36 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:33:15,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=15.0 2023-11-18 15:33:18,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 9.507e+01 1.083e+02 1.273e+02 2.415e+02, threshold=2.165e+02, percent-clipped=1.0 2023-11-18 15:33:19,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=292186.6666666667, ans=0.125 2023-11-18 15:33:28,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=292253.3333333333, ans=0.0 2023-11-18 15:33:59,633 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7800, loss[loss=0.1308, simple_loss=0.1469, pruned_loss=0.04537, audio_tagging_loss=0.01192, over 15803.00 frames. ], tot_loss[loss=0.1096, simple_loss=0.1233, pruned_loss=0.0364, audio_tagging_loss=0.01156, over 3038737.21 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:34:46,685 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=22.5 2023-11-18 15:34:52,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=292720.0, ans=0.125 2023-11-18 15:34:55,492 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7850, loss[loss=0.1169, simple_loss=0.139, pruned_loss=0.03728, audio_tagging_loss=0.01014, over 15229.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1223, pruned_loss=0.03614, audio_tagging_loss=0.01161, over 3040998.53 frames. ], batch size: 59, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:34:57,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=292786.6666666667, ans=0.125 2023-11-18 15:34:59,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=22.5 2023-11-18 15:35:09,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.474e+01 9.851e+01 1.052e+02 1.175e+02 1.725e+02, threshold=2.105e+02, percent-clipped=0.0 2023-11-18 15:35:14,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=292853.3333333333, ans=0.1 2023-11-18 15:35:14,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=292853.3333333333, ans=0.95 2023-11-18 15:35:32,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=292986.6666666667, ans=0.0 2023-11-18 15:35:36,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-18 15:35:45,264 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:35:48,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=15.0 2023-11-18 15:35:50,171 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7900, loss[loss=0.1036, simple_loss=0.1209, pruned_loss=0.03203, audio_tagging_loss=0.01111, over 14316.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1221, pruned_loss=0.03607, audio_tagging_loss=0.01169, over 3044365.58 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:36:01,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=293186.6666666667, ans=0.125 2023-11-18 15:36:29,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2023-11-18 15:36:43,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.18 vs. limit=15.0 2023-11-18 15:36:47,583 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 7950, loss[loss=0.09819, simple_loss=0.1257, pruned_loss=0.02543, audio_tagging_loss=0.009927, over 15893.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1206, pruned_loss=0.03562, audio_tagging_loss=0.0119, over 3040503.49 frames. ], batch size: 59, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:36:53,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=293453.3333333333, ans=0.0 2023-11-18 15:37:00,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=293520.0, ans=0.125 2023-11-18 15:37:02,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.429e+01 9.694e+01 1.093e+02 1.229e+02 1.791e+02, threshold=2.186e+02, percent-clipped=0.0 2023-11-18 15:37:02,874 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:37:13,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293586.6666666667, ans=0.1 2023-11-18 15:37:15,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2023-11-18 15:37:35,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=293720.0, ans=0.125 2023-11-18 15:37:37,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=293720.0, ans=0.2 2023-11-18 15:37:43,991 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8000, loss[loss=0.1375, simple_loss=0.1538, pruned_loss=0.05025, audio_tagging_loss=0.01033, over 15180.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1193, pruned_loss=0.03544, audio_tagging_loss=0.01203, over 3035310.40 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:37:48,764 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2023-11-18 15:37:51,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=293786.6666666667, ans=0.025 2023-11-18 15:38:22,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=15.0 2023-11-18 15:38:38,554 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8050, loss[loss=0.08938, simple_loss=0.09966, pruned_loss=0.02813, audio_tagging_loss=0.01141, over 16487.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1188, pruned_loss=0.03536, audio_tagging_loss=0.0122, over 3037918.98 frames. ], batch size: 63, lr: 1.61e-02, grad_scale: 16.0 2023-11-18 15:38:53,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 1.018e+02 1.096e+02 1.204e+02 1.820e+02, threshold=2.193e+02, percent-clipped=0.0 2023-11-18 15:38:56,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2023-11-18 15:39:00,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=22.5 2023-11-18 15:39:01,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=294253.3333333333, ans=0.0 2023-11-18 15:39:03,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=294253.3333333333, ans=0.0 2023-11-18 15:39:11,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=294320.0, ans=15.0 2023-11-18 15:39:14,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=294320.0, ans=0.025 2023-11-18 15:39:14,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=294320.0, ans=0.0 2023-11-18 15:39:30,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-11-18 15:39:33,370 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8100, loss[loss=0.1769, simple_loss=0.2092, pruned_loss=0.06594, audio_tagging_loss=0.006323, over 15346.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1201, pruned_loss=0.03564, audio_tagging_loss=0.01193, over 3041153.61 frames. ], batch size: 54, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:39:33,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=294453.3333333333, ans=0.125 2023-11-18 15:39:40,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=294453.3333333333, ans=0.0 2023-11-18 15:39:43,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=294453.3333333333, ans=0.0 2023-11-18 15:39:45,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=294520.0, ans=0.1 2023-11-18 15:39:48,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=294520.0, ans=15.0 2023-11-18 15:40:10,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=294653.3333333333, ans=0.0 2023-11-18 15:40:11,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=294653.3333333333, ans=0.125 2023-11-18 15:40:28,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2023-11-18 15:40:29,785 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8150, loss[loss=0.1345, simple_loss=0.1556, pruned_loss=0.04464, audio_tagging_loss=0.01202, over 15435.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1188, pruned_loss=0.03509, audio_tagging_loss=0.01186, over 3030386.17 frames. ], batch size: 56, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:40:31,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=294786.6666666667, ans=0.125 2023-11-18 15:40:44,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 9.329e+01 1.045e+02 1.150e+02 1.655e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 15:40:46,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=294853.3333333333, ans=0.0 2023-11-18 15:40:58,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=294920.0, ans=0.0 2023-11-18 15:41:02,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=294986.6666666667, ans=0.125 2023-11-18 15:41:24,199 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8200, loss[loss=0.09646, simple_loss=0.1123, pruned_loss=0.02736, audio_tagging_loss=0.01295, over 15804.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1192, pruned_loss=0.03508, audio_tagging_loss=0.01169, over 3036282.51 frames. ], batch size: 59, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:41:26,335 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:41:33,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=295186.6666666667, ans=0.0 2023-11-18 15:41:40,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295186.6666666667, ans=0.1 2023-11-18 15:41:43,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-11-18 15:41:57,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=295320.0, ans=0.125 2023-11-18 15:41:58,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=295320.0, ans=0.0 2023-11-18 15:42:06,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=295320.0, ans=0.125 2023-11-18 15:42:19,559 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8250, loss[loss=0.1292, simple_loss=0.1469, pruned_loss=0.0455, audio_tagging_loss=0.01025, over 14302.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1205, pruned_loss=0.03532, audio_tagging_loss=0.01171, over 3039840.05 frames. ], batch size: 52, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:42:21,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=295453.3333333333, ans=0.125 2023-11-18 15:42:34,806 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 9.274e+01 1.030e+02 1.127e+02 2.119e+02, threshold=2.060e+02, percent-clipped=1.0 2023-11-18 15:42:54,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=295653.3333333333, ans=0.125 2023-11-18 15:42:58,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=295653.3333333333, ans=0.125 2023-11-18 15:43:01,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-18 15:43:02,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=295720.0, ans=0.125 2023-11-18 15:43:05,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.32 vs. limit=6.0 2023-11-18 15:43:07,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=295720.0, ans=0.125 2023-11-18 15:43:07,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=295720.0, ans=0.0 2023-11-18 15:43:15,139 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8300, loss[loss=0.1131, simple_loss=0.1357, pruned_loss=0.03673, audio_tagging_loss=0.008535, over 16049.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1224, pruned_loss=0.03583, audio_tagging_loss=0.01154, over 3047031.96 frames. ], batch size: 59, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:43:26,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.69 vs. limit=10.0 2023-11-18 15:43:29,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.64 vs. limit=22.5 2023-11-18 15:43:32,171 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2023-11-18 15:43:34,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=295853.3333333333, ans=0.0 2023-11-18 15:43:36,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295920.0, ans=0.1 2023-11-18 15:43:36,410 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.44 vs. limit=15.0 2023-11-18 15:43:58,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295986.6666666667, ans=0.1 2023-11-18 15:44:01,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-11-18 15:44:11,153 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8350, loss[loss=0.1093, simple_loss=0.1171, pruned_loss=0.03958, audio_tagging_loss=0.01119, over 14318.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.123, pruned_loss=0.03606, audio_tagging_loss=0.01153, over 3051022.55 frames. ], batch size: 54, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:44:17,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=296120.0, ans=0.04949747468305833 2023-11-18 15:44:18,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=296120.0, ans=0.07 2023-11-18 15:44:19,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296120.0, ans=0.1 2023-11-18 15:44:21,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=296186.6666666667, ans=0.125 2023-11-18 15:44:26,493 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.554e+01 1.077e+02 1.196e+02 1.483e+02, threshold=2.155e+02, percent-clipped=0.0 2023-11-18 15:44:27,849 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:44:29,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=296186.6666666667, ans=0.125 2023-11-18 15:44:30,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=296186.6666666667, ans=0.125 2023-11-18 15:44:31,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296253.3333333333, ans=0.1 2023-11-18 15:44:53,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=296320.0, ans=0.0 2023-11-18 15:44:59,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2023-11-18 15:45:03,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=296386.6666666667, ans=0.0 2023-11-18 15:45:05,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=296453.3333333333, ans=0.125 2023-11-18 15:45:06,081 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8400, loss[loss=0.1028, simple_loss=0.1125, pruned_loss=0.03518, audio_tagging_loss=0.01134, over 16325.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1224, pruned_loss=0.03585, audio_tagging_loss=0.01154, over 3045816.72 frames. ], batch size: 60, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:45:35,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296586.6666666667, ans=0.1 2023-11-18 15:45:36,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=296586.6666666667, ans=0.0 2023-11-18 15:45:50,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=296720.0, ans=0.125 2023-11-18 15:45:51,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=296720.0, ans=0.1 2023-11-18 15:45:51,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2023-11-18 15:45:56,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2023-11-18 15:46:02,604 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8450, loss[loss=0.09479, simple_loss=0.0973, pruned_loss=0.03209, audio_tagging_loss=0.01405, over 14391.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1209, pruned_loss=0.03549, audio_tagging_loss=0.01165, over 3043511.55 frames. ], batch size: 55, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:46:03,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=296786.6666666667, ans=0.125 2023-11-18 15:46:04,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=296786.6666666667, ans=0.125 2023-11-18 15:46:11,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-11-18 15:46:17,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.027e+01 9.338e+01 1.042e+02 1.138e+02 1.608e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 15:46:26,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=296920.0, ans=0.125 2023-11-18 15:46:47,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=297053.3333333333, ans=0.125 2023-11-18 15:46:51,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=297053.3333333333, ans=0.0 2023-11-18 15:46:53,464 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:46:57,412 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8500, loss[loss=0.1315, simple_loss=0.1637, pruned_loss=0.04057, audio_tagging_loss=0.009076, over 15584.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1213, pruned_loss=0.03571, audio_tagging_loss=0.01164, over 3038100.51 frames. ], batch size: 56, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:47:02,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=12.0 2023-11-18 15:47:15,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=297186.6666666667, ans=0.2 2023-11-18 15:47:21,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=297253.3333333333, ans=0.07 2023-11-18 15:47:34,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297320.0, ans=0.1 2023-11-18 15:47:39,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=297320.0, ans=0.125 2023-11-18 15:47:44,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=297386.6666666667, ans=0.125 2023-11-18 15:47:53,001 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8550, loss[loss=0.1074, simple_loss=0.124, pruned_loss=0.03359, audio_tagging_loss=0.01177, over 14247.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.122, pruned_loss=0.03569, audio_tagging_loss=0.0115, over 3040615.56 frames. ], batch size: 54, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:48:02,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=297453.3333333333, ans=0.0 2023-11-18 15:48:09,325 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.282e+01 9.983e+01 1.095e+02 1.210e+02 1.627e+02, threshold=2.189e+02, percent-clipped=0.0 2023-11-18 15:48:17,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=297586.6666666667, ans=10.0 2023-11-18 15:48:38,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=297720.0, ans=0.1 2023-11-18 15:48:49,341 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8600, loss[loss=0.0859, simple_loss=0.08685, pruned_loss=0.02827, audio_tagging_loss=0.01421, over 16447.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.122, pruned_loss=0.03554, audio_tagging_loss=0.01159, over 3048152.26 frames. ], batch size: 64, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:48:49,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=297786.6666666667, ans=0.125 2023-11-18 15:49:00,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-11-18 15:49:02,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=297853.3333333333, ans=0.0 2023-11-18 15:49:34,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=298053.3333333333, ans=0.125 2023-11-18 15:49:43,349 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8650, loss[loss=0.07583, simple_loss=0.0885, pruned_loss=0.01967, audio_tagging_loss=0.01192, over 16011.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1222, pruned_loss=0.03555, audio_tagging_loss=0.01162, over 3053731.91 frames. ], batch size: 59, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:49:58,604 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.945e+01 9.623e+01 1.078e+02 1.210e+02 1.696e+02, threshold=2.155e+02, percent-clipped=0.0 2023-11-18 15:50:14,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=298253.3333333333, ans=0.0 2023-11-18 15:50:18,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=15.0 2023-11-18 15:50:26,663 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.37 vs. limit=15.0 2023-11-18 15:50:38,377 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8700, loss[loss=0.09376, simple_loss=0.1037, pruned_loss=0.02956, audio_tagging_loss=0.01236, over 16708.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1211, pruned_loss=0.03525, audio_tagging_loss=0.01169, over 3056529.30 frames. ], batch size: 66, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:50:51,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=298520.0, ans=0.2 2023-11-18 15:51:07,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=298586.6666666667, ans=0.0 2023-11-18 15:51:08,488 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.40 vs. limit=22.5 2023-11-18 15:51:27,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=298720.0, ans=0.0 2023-11-18 15:51:33,493 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8750, loss[loss=0.122, simple_loss=0.1429, pruned_loss=0.0411, audio_tagging_loss=0.009415, over 14660.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1227, pruned_loss=0.03581, audio_tagging_loss=0.01185, over 3054059.19 frames. ], batch size: 53, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:51:48,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.159e+01 9.840e+01 1.091e+02 1.232e+02 1.815e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 15:51:49,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=298853.3333333333, ans=0.1 2023-11-18 15:51:50,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=15.0 2023-11-18 15:52:28,374 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8800, loss[loss=0.09413, simple_loss=0.1111, pruned_loss=0.02699, audio_tagging_loss=0.01157, over 16213.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1216, pruned_loss=0.03533, audio_tagging_loss=0.01197, over 3055218.84 frames. ], batch size: 58, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:52:49,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=299253.3333333333, ans=0.125 2023-11-18 15:52:56,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=299253.3333333333, ans=0.125 2023-11-18 15:53:01,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=299320.0, ans=0.125 2023-11-18 15:53:20,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=299386.6666666667, ans=0.125 2023-11-18 15:53:22,534 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8850, loss[loss=0.1297, simple_loss=0.1515, pruned_loss=0.04385, audio_tagging_loss=0.01014, over 16132.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1217, pruned_loss=0.03553, audio_tagging_loss=0.01199, over 3057873.48 frames. ], batch size: 60, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:53:32,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=299453.3333333333, ans=0.2 2023-11-18 15:53:32,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=299453.3333333333, ans=0.0 2023-11-18 15:53:35,237 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:53:35,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=299520.0, ans=0.0 2023-11-18 15:53:38,346 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 9.407e+01 1.047e+02 1.181e+02 1.757e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 15:53:40,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.30 vs. limit=15.0 2023-11-18 15:53:43,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=299520.0, ans=0.125 2023-11-18 15:53:44,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2023-11-18 15:53:57,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=299653.3333333333, ans=0.0 2023-11-18 15:54:02,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=299653.3333333333, ans=0.125 2023-11-18 15:54:04,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=299653.3333333333, ans=0.125 2023-11-18 15:54:16,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=299720.0, ans=0.2 2023-11-18 15:54:17,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=299786.6666666667, ans=0.125 2023-11-18 15:54:17,932 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8900, loss[loss=0.07714, simple_loss=0.08058, pruned_loss=0.0262, audio_tagging_loss=0.01065, over 14557.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1214, pruned_loss=0.03529, audio_tagging_loss=0.0118, over 3060576.82 frames. ], batch size: 56, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:54:39,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=299920.0, ans=0.0 2023-11-18 15:54:57,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=299986.6666666667, ans=0.0 2023-11-18 15:55:02,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=300053.3333333333, ans=0.125 2023-11-18 15:55:10,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2023-11-18 15:55:12,594 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 8950, loss[loss=0.1077, simple_loss=0.1187, pruned_loss=0.03662, audio_tagging_loss=0.01175, over 15981.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1218, pruned_loss=0.03557, audio_tagging_loss=0.01165, over 3063506.35 frames. ], batch size: 60, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:55:16,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=300120.0, ans=0.125 2023-11-18 15:55:19,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=300120.0, ans=0.0 2023-11-18 15:55:27,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.233e+01 1.016e+02 1.150e+02 1.659e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 15:56:03,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300386.6666666667, ans=0.1 2023-11-18 15:56:06,848 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9000, loss[loss=0.12, simple_loss=0.1502, pruned_loss=0.03291, audio_tagging_loss=0.01201, over 15473.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1241, pruned_loss=0.03638, audio_tagging_loss=0.0114, over 3060844.99 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:56:06,849 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 15:56:20,135 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([4.2936, 3.2069, 3.3505, 4.3018, 3.4818, 3.4901, 3.9653, 3.3843], device='cuda:3') 2023-11-18 15:56:40,138 INFO [train_asr.py:1147] (3/4) Epoch 4, validation: loss=0.07668, simple_loss=0.06181, pruned_loss=0.009869, audio_tagging_loss=0.03591, over 4681554.00 frames. 2023-11-18 15:56:40,139 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 15:56:51,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=300520.0, ans=0.125 2023-11-18 15:57:07,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=300586.6666666667, ans=0.2 2023-11-18 15:57:08,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-11-18 15:57:20,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=300653.3333333333, ans=0.1 2023-11-18 15:57:32,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=300720.0, ans=0.0 2023-11-18 15:57:34,506 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9050, loss[loss=0.1334, simple_loss=0.1535, pruned_loss=0.04595, audio_tagging_loss=0.01071, over 15767.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.123, pruned_loss=0.03589, audio_tagging_loss=0.01146, over 3070921.23 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:57:50,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.255e+01 1.039e+02 1.147e+02 2.056e+02, threshold=2.078e+02, percent-clipped=1.0 2023-11-18 15:57:57,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=300920.0, ans=0.125 2023-11-18 15:58:12,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=300986.6666666667, ans=0.0 2023-11-18 15:58:12,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=300986.6666666667, ans=0.05 2023-11-18 15:58:28,443 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9100, loss[loss=0.1124, simple_loss=0.1246, pruned_loss=0.03865, audio_tagging_loss=0.0115, over 14278.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1222, pruned_loss=0.03564, audio_tagging_loss=0.0115, over 3061515.25 frames. ], batch size: 53, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:58:47,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=301186.6666666667, ans=0.125 2023-11-18 15:58:53,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301253.3333333333, ans=0.1 2023-11-18 15:59:03,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=301320.0, ans=0.125 2023-11-18 15:59:09,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=301320.0, ans=0.0 2023-11-18 15:59:11,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=301386.6666666667, ans=0.125 2023-11-18 15:59:23,957 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9150, loss[loss=0.07075, simple_loss=0.08943, pruned_loss=0.01707, audio_tagging_loss=0.008969, over 14737.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1218, pruned_loss=0.03542, audio_tagging_loss=0.01142, over 3056745.95 frames. ], batch size: 55, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:59:25,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-11-18 15:59:26,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=15.0 2023-11-18 15:59:31,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=301453.3333333333, ans=0.2 2023-11-18 15:59:36,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=301520.0, ans=0.2 2023-11-18 15:59:40,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-11-18 15:59:41,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 9.042e+01 1.024e+02 1.134e+02 1.471e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 15:59:41,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=301520.0, ans=0.125 2023-11-18 15:59:42,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301520.0, ans=0.1 2023-11-18 15:59:43,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=301520.0, ans=0.2 2023-11-18 15:59:50,325 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=301586.6666666667, ans=0.125 2023-11-18 15:59:53,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=301586.6666666667, ans=0.125 2023-11-18 15:59:57,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=301653.3333333333, ans=0.0 2023-11-18 15:59:58,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=301653.3333333333, ans=0.0 2023-11-18 16:00:02,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=301653.3333333333, ans=0.2 2023-11-18 16:00:14,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=301720.0, ans=0.0 2023-11-18 16:00:19,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=301720.0, ans=0.2 2023-11-18 16:00:20,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=301786.6666666667, ans=0.0 2023-11-18 16:00:21,039 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9200, loss[loss=0.09174, simple_loss=0.09627, pruned_loss=0.02947, audio_tagging_loss=0.01414, over 14218.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1228, pruned_loss=0.03577, audio_tagging_loss=0.01131, over 3066527.27 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 16:00:23,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=301786.6666666667, ans=0.125 2023-11-18 16:00:44,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=301920.0, ans=0.125 2023-11-18 16:00:56,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2023-11-18 16:01:08,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2023-11-18 16:01:08,287 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2023-11-18 16:01:16,224 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9250, loss[loss=0.07565, simple_loss=0.09748, pruned_loss=0.01561, audio_tagging_loss=0.01131, over 15005.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1219, pruned_loss=0.0355, audio_tagging_loss=0.01136, over 3057983.41 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:01:19,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-18 16:01:21,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=302120.0, ans=0.125 2023-11-18 16:01:33,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 9.477e+01 1.067e+02 1.208e+02 1.657e+02, threshold=2.134e+02, percent-clipped=0.0 2023-11-18 16:01:33,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.30 vs. limit=22.5 2023-11-18 16:02:03,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.77 vs. limit=10.0 2023-11-18 16:02:04,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=302386.6666666667, ans=0.05 2023-11-18 16:02:11,899 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9300, loss[loss=0.07296, simple_loss=0.08248, pruned_loss=0.0175, audio_tagging_loss=0.01422, over 14686.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1207, pruned_loss=0.03494, audio_tagging_loss=0.01153, over 3061734.18 frames. ], batch size: 58, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:02:12,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2023-11-18 16:02:19,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=302453.3333333333, ans=0.2 2023-11-18 16:02:38,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2023-11-18 16:02:53,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=302653.3333333333, ans=0.5 2023-11-18 16:02:57,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=302720.0, ans=0.0 2023-11-18 16:03:05,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=302720.0, ans=0.0 2023-11-18 16:03:09,204 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9350, loss[loss=0.09736, simple_loss=0.1144, pruned_loss=0.02851, audio_tagging_loss=0.01167, over 16667.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1193, pruned_loss=0.03482, audio_tagging_loss=0.01168, over 3062439.17 frames. ], batch size: 61, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:03:15,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302786.6666666667, ans=0.1 2023-11-18 16:03:17,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=302786.6666666667, ans=0.0 2023-11-18 16:03:21,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=302853.3333333333, ans=0.0 2023-11-18 16:03:22,254 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=15.0 2023-11-18 16:03:24,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.020e+01 1.030e+02 1.167e+02 1.548e+02, threshold=2.059e+02, percent-clipped=0.0 2023-11-18 16:03:47,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=302986.6666666667, ans=0.125 2023-11-18 16:03:58,470 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.33 vs. limit=15.0 2023-11-18 16:04:00,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=303053.3333333333, ans=0.125 2023-11-18 16:04:04,195 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9400, loss[loss=0.08381, simple_loss=0.08416, pruned_loss=0.02603, audio_tagging_loss=0.0157, over 14973.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1185, pruned_loss=0.03478, audio_tagging_loss=0.01179, over 3057247.55 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:04:13,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-11-18 16:04:18,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=22.5 2023-11-18 16:04:30,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2023-11-18 16:04:32,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=303253.3333333333, ans=0.0 2023-11-18 16:04:49,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=303386.6666666667, ans=0.05 2023-11-18 16:04:53,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=303386.6666666667, ans=0.125 2023-11-18 16:05:00,033 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9450, loss[loss=0.09603, simple_loss=0.1069, pruned_loss=0.03026, audio_tagging_loss=0.01232, over 14435.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1198, pruned_loss=0.03541, audio_tagging_loss=0.01189, over 3056924.14 frames. ], batch size: 55, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:05:00,070 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:05:00,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=303453.3333333333, ans=0.0 2023-11-18 16:05:16,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=303520.0, ans=0.0 2023-11-18 16:05:16,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-18 16:05:17,120 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.847e+01 9.577e+01 1.061e+02 1.222e+02 1.461e+02, threshold=2.121e+02, percent-clipped=0.0 2023-11-18 16:05:21,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-18 16:05:37,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=303653.3333333333, ans=0.125 2023-11-18 16:05:40,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=303653.3333333333, ans=0.125 2023-11-18 16:05:42,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=12.0 2023-11-18 16:05:46,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=303720.0, ans=0.0 2023-11-18 16:05:47,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=303720.0, ans=0.2 2023-11-18 16:05:56,426 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9500, loss[loss=0.09798, simple_loss=0.1064, pruned_loss=0.03157, audio_tagging_loss=0.01321, over 15739.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1194, pruned_loss=0.03526, audio_tagging_loss=0.01195, over 3050769.22 frames. ], batch size: 61, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:06:11,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=303853.3333333333, ans=0.2 2023-11-18 16:06:20,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2023-11-18 16:06:39,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=303986.6666666667, ans=0.0 2023-11-18 16:06:41,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=304053.3333333333, ans=0.2 2023-11-18 16:06:52,132 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9550, loss[loss=0.08603, simple_loss=0.09327, pruned_loss=0.02287, audio_tagging_loss=0.01652, over 15475.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1195, pruned_loss=0.03518, audio_tagging_loss=0.01209, over 3049527.86 frames. ], batch size: 58, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:07:08,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.602e+01 1.044e+02 1.160e+02 1.697e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 16:07:11,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=12.0 2023-11-18 16:07:21,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=304253.3333333333, ans=0.0 2023-11-18 16:07:31,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=304320.0, ans=0.2 2023-11-18 16:07:48,092 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9600, loss[loss=0.09222, simple_loss=0.1026, pruned_loss=0.0286, audio_tagging_loss=0.01233, over 14573.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1205, pruned_loss=0.03547, audio_tagging_loss=0.01205, over 3053453.07 frames. ], batch size: 55, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:08:09,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=304586.6666666667, ans=0.1 2023-11-18 16:08:21,757 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=12.0 2023-11-18 16:08:44,185 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9650, loss[loss=0.1432, simple_loss=0.1632, pruned_loss=0.05153, audio_tagging_loss=0.01007, over 15940.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1198, pruned_loss=0.03512, audio_tagging_loss=0.01203, over 3041944.18 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:09:00,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 9.312e+01 1.013e+02 1.091e+02 1.612e+02, threshold=2.027e+02, percent-clipped=0.0 2023-11-18 16:09:01,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304853.3333333333, ans=0.1 2023-11-18 16:09:03,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=304853.3333333333, ans=0.0 2023-11-18 16:09:15,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304920.0, ans=0.1 2023-11-18 16:09:15,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=15.0 2023-11-18 16:09:23,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=304986.6666666667, ans=0.0 2023-11-18 16:09:34,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=305053.3333333333, ans=0.0 2023-11-18 16:09:39,308 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9700, loss[loss=0.1135, simple_loss=0.1281, pruned_loss=0.03677, audio_tagging_loss=0.01268, over 14445.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1214, pruned_loss=0.0356, audio_tagging_loss=0.01183, over 3044097.86 frames. ], batch size: 55, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:09:48,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=305120.0, ans=0.125 2023-11-18 16:09:51,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2023-11-18 16:09:52,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305186.6666666667, ans=0.1 2023-11-18 16:10:07,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=305253.3333333333, ans=0.125 2023-11-18 16:10:14,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=305320.0, ans=0.125 2023-11-18 16:10:22,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=305320.0, ans=0.2 2023-11-18 16:10:30,946 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=15.0 2023-11-18 16:10:35,411 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9750, loss[loss=0.08633, simple_loss=0.09447, pruned_loss=0.02807, audio_tagging_loss=0.01103, over 13814.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1203, pruned_loss=0.03501, audio_tagging_loss=0.01169, over 3044292.26 frames. ], batch size: 53, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:10:38,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=305453.3333333333, ans=0.0 2023-11-18 16:10:50,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=305520.0, ans=0.125 2023-11-18 16:10:53,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 9.337e+01 1.028e+02 1.130e+02 1.491e+02, threshold=2.056e+02, percent-clipped=0.0 2023-11-18 16:11:14,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=305653.3333333333, ans=0.05 2023-11-18 16:11:16,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=305653.3333333333, ans=0.125 2023-11-18 16:11:32,499 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9800, loss[loss=0.1212, simple_loss=0.1436, pruned_loss=0.0404, audio_tagging_loss=0.009049, over 15827.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1211, pruned_loss=0.03512, audio_tagging_loss=0.01156, over 3036526.41 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:11:33,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=305786.6666666667, ans=0.125 2023-11-18 16:11:44,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=12.0 2023-11-18 16:11:50,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=305853.3333333333, ans=0.5 2023-11-18 16:11:51,898 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=305853.3333333333, ans=0.0 2023-11-18 16:12:04,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-11-18 16:12:06,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305986.6666666667, ans=0.1 2023-11-18 16:12:11,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-11-18 16:12:23,804 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:12:28,132 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9850, loss[loss=0.1229, simple_loss=0.1406, pruned_loss=0.04359, audio_tagging_loss=0.008969, over 14507.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1209, pruned_loss=0.03496, audio_tagging_loss=0.01147, over 3041625.47 frames. ], batch size: 55, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:12:30,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=306120.0, ans=0.0 2023-11-18 16:12:31,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=306120.0, ans=0.2 2023-11-18 16:12:31,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=306120.0, ans=0.04949747468305833 2023-11-18 16:12:45,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.089e+01 9.433e+01 1.029e+02 1.148e+02 1.487e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-18 16:12:46,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=306186.6666666667, ans=0.125 2023-11-18 16:12:53,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.58 vs. limit=22.5 2023-11-18 16:13:02,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306320.0, ans=0.1 2023-11-18 16:13:10,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=306320.0, ans=0.2 2023-11-18 16:13:17,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=306386.6666666667, ans=0.125 2023-11-18 16:13:19,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=306386.6666666667, ans=0.125 2023-11-18 16:13:23,973 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9900, loss[loss=0.1005, simple_loss=0.1083, pruned_loss=0.03355, audio_tagging_loss=0.01286, over 14714.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.12, pruned_loss=0.03468, audio_tagging_loss=0.01141, over 3036024.46 frames. ], batch size: 57, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:13:48,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=306586.6666666667, ans=0.125 2023-11-18 16:14:20,559 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 9950, loss[loss=0.1027, simple_loss=0.1198, pruned_loss=0.03121, audio_tagging_loss=0.01157, over 15218.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1206, pruned_loss=0.03475, audio_tagging_loss=0.01135, over 3046437.23 frames. ], batch size: 58, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:14:27,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=306786.6666666667, ans=0.125 2023-11-18 16:14:29,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=306786.6666666667, ans=0.2 2023-11-18 16:14:30,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=306853.3333333333, ans=10.0 2023-11-18 16:14:34,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.94 vs. limit=22.5 2023-11-18 16:14:36,439 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.909e+01 9.576e+01 1.088e+02 1.219e+02 1.506e+02, threshold=2.175e+02, percent-clipped=0.0 2023-11-18 16:14:36,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=306853.3333333333, ans=0.0 2023-11-18 16:14:39,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=306853.3333333333, ans=0.2 2023-11-18 16:15:03,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=306986.6666666667, ans=0.125 2023-11-18 16:15:15,744 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10000, loss[loss=0.1324, simple_loss=0.1414, pruned_loss=0.04944, audio_tagging_loss=0.01221, over 15379.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1198, pruned_loss=0.03426, audio_tagging_loss=0.0115, over 3042671.39 frames. ], batch size: 57, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:15:22,275 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:15:31,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.32 vs. limit=15.0 2023-11-18 16:16:04,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307386.6666666667, ans=0.1 2023-11-18 16:16:09,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=307386.6666666667, ans=0.0 2023-11-18 16:16:11,297 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10050, loss[loss=0.1562, simple_loss=0.1741, pruned_loss=0.05937, audio_tagging_loss=0.009757, over 14878.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1206, pruned_loss=0.03471, audio_tagging_loss=0.01151, over 3044428.06 frames. ], batch size: 53, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:16:16,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-11-18 16:16:19,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=307453.3333333333, ans=0.125 2023-11-18 16:16:29,453 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 9.427e+01 1.040e+02 1.141e+02 1.376e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 16:17:05,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=307720.0, ans=15.0 2023-11-18 16:17:05,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-18 16:17:05,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307720.0, ans=0.1 2023-11-18 16:17:08,296 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10100, loss[loss=0.06768, simple_loss=0.07709, pruned_loss=0.01476, audio_tagging_loss=0.01437, over 14639.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1202, pruned_loss=0.03446, audio_tagging_loss=0.01163, over 3048236.56 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:17:14,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=307786.6666666667, ans=0.035 2023-11-18 16:17:31,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-11-18 16:17:38,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.16 vs. limit=22.5 2023-11-18 16:17:41,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=307986.6666666667, ans=0.2 2023-11-18 16:17:55,294 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:17:55,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=308053.3333333333, ans=0.0 2023-11-18 16:18:00,743 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:18:03,729 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10150, loss[loss=0.1282, simple_loss=0.1472, pruned_loss=0.04396, audio_tagging_loss=0.01064, over 15152.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1212, pruned_loss=0.03486, audio_tagging_loss=0.0116, over 3050289.44 frames. ], batch size: 58, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:18:19,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 9.614e+01 1.045e+02 1.146e+02 1.690e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 16:18:31,567 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:18:46,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=308320.0, ans=0.125 2023-11-18 16:18:59,159 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10200, loss[loss=0.1388, simple_loss=0.1605, pruned_loss=0.04599, audio_tagging_loss=0.01252, over 15177.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1214, pruned_loss=0.03517, audio_tagging_loss=0.01172, over 3048014.30 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:18:59,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308453.3333333333, ans=0.1 2023-11-18 16:19:12,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=308520.0, ans=0.2 2023-11-18 16:19:21,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=308586.6666666667, ans=0.2 2023-11-18 16:19:22,776 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:19:42,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=15.0 2023-11-18 16:19:43,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=308720.0, ans=0.0 2023-11-18 16:19:45,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=308720.0, ans=0.125 2023-11-18 16:19:55,121 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10250, loss[loss=0.09764, simple_loss=0.112, pruned_loss=0.02908, audio_tagging_loss=0.01256, over 14412.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1207, pruned_loss=0.03495, audio_tagging_loss=0.01187, over 3048622.92 frames. ], batch size: 54, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:20:12,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 9.476e+01 1.039e+02 1.199e+02 1.617e+02, threshold=2.078e+02, percent-clipped=0.0 2023-11-18 16:20:20,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=308920.0, ans=0.125 2023-11-18 16:20:25,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=308920.0, ans=0.125 2023-11-18 16:20:25,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=308920.0, ans=0.125 2023-11-18 16:20:28,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=308986.6666666667, ans=0.125 2023-11-18 16:20:34,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=308986.6666666667, ans=0.0 2023-11-18 16:20:43,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309053.3333333333, ans=0.1 2023-11-18 16:20:45,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=309053.3333333333, ans=0.0 2023-11-18 16:20:45,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=309053.3333333333, ans=10.0 2023-11-18 16:20:48,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=309053.3333333333, ans=0.0 2023-11-18 16:20:51,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=309120.0, ans=0.1 2023-11-18 16:20:51,932 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10300, loss[loss=0.1239, simple_loss=0.1452, pruned_loss=0.03909, audio_tagging_loss=0.01218, over 15409.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1207, pruned_loss=0.03517, audio_tagging_loss=0.01185, over 3056307.20 frames. ], batch size: 57, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:20:56,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=309120.0, ans=0.125 2023-11-18 16:20:57,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=309120.0, ans=0.05 2023-11-18 16:21:18,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=309253.3333333333, ans=0.5 2023-11-18 16:21:20,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=309253.3333333333, ans=0.0 2023-11-18 16:21:36,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309386.6666666667, ans=0.1 2023-11-18 16:21:39,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2023-11-18 16:21:44,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309386.6666666667, ans=0.1 2023-11-18 16:21:45,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=309386.6666666667, ans=0.2 2023-11-18 16:21:47,524 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10350, loss[loss=0.113, simple_loss=0.1316, pruned_loss=0.0359, audio_tagging_loss=0.01126, over 14219.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1199, pruned_loss=0.0349, audio_tagging_loss=0.01201, over 3048667.31 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:22:04,394 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 9.661e+01 1.063e+02 1.175e+02 1.992e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 16:22:26,278 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=6.0 2023-11-18 16:22:27,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2023-11-18 16:22:35,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=309720.0, ans=0.125 2023-11-18 16:22:39,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=309720.0, ans=0.2 2023-11-18 16:22:43,318 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10400, loss[loss=0.1237, simple_loss=0.1389, pruned_loss=0.0447, audio_tagging_loss=0.009547, over 15292.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1204, pruned_loss=0.03527, audio_tagging_loss=0.01221, over 3048943.44 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:22:54,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=309853.3333333333, ans=0.125 2023-11-18 16:22:55,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=309853.3333333333, ans=0.125 2023-11-18 16:23:20,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=309986.6666666667, ans=0.125 2023-11-18 16:23:34,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=310053.3333333333, ans=0.0 2023-11-18 16:23:35,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=310053.3333333333, ans=0.125 2023-11-18 16:23:39,767 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10450, loss[loss=0.106, simple_loss=0.1128, pruned_loss=0.03738, audio_tagging_loss=0.01227, over 14066.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1192, pruned_loss=0.03472, audio_tagging_loss=0.01215, over 3047242.68 frames. ], batch size: 52, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:23:52,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=15.0 2023-11-18 16:23:56,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 9.086e+01 9.811e+01 1.148e+02 1.710e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-18 16:23:58,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=310186.6666666667, ans=0.125 2023-11-18 16:24:03,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=310253.3333333333, ans=0.0 2023-11-18 16:24:32,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=310386.6666666667, ans=0.125 2023-11-18 16:24:35,558 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10500, loss[loss=0.1314, simple_loss=0.153, pruned_loss=0.04665, audio_tagging_loss=0.008209, over 16448.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1195, pruned_loss=0.03469, audio_tagging_loss=0.01186, over 3046564.97 frames. ], batch size: 58, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:25:07,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=310586.6666666667, ans=0.0 2023-11-18 16:25:08,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=310653.3333333333, ans=0.0 2023-11-18 16:25:25,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310720.0, ans=0.1 2023-11-18 16:25:27,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=310720.0, ans=0.09899494936611666 2023-11-18 16:25:32,010 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10550, loss[loss=0.109, simple_loss=0.1263, pruned_loss=0.03667, audio_tagging_loss=0.009193, over 15527.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1206, pruned_loss=0.03488, audio_tagging_loss=0.01168, over 3046503.28 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:25:33,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=310786.6666666667, ans=0.125 2023-11-18 16:25:35,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=310786.6666666667, ans=0.125 2023-11-18 16:25:41,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=310786.6666666667, ans=15.0 2023-11-18 16:25:47,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=310853.3333333333, ans=0.2 2023-11-18 16:25:49,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 9.120e+01 1.006e+02 1.112e+02 1.547e+02, threshold=2.011e+02, percent-clipped=0.0 2023-11-18 16:26:05,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=310986.6666666667, ans=0.1 2023-11-18 16:26:15,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=310986.6666666667, ans=0.125 2023-11-18 16:26:28,616 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10600, loss[loss=0.1022, simple_loss=0.1126, pruned_loss=0.03625, audio_tagging_loss=0.009638, over 15199.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1191, pruned_loss=0.03426, audio_tagging_loss=0.01172, over 3044770.11 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:26:29,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.12 vs. limit=22.5 2023-11-18 16:26:39,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=311186.6666666667, ans=0.125 2023-11-18 16:27:12,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=311386.6666666667, ans=0.125 2023-11-18 16:27:16,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=311386.6666666667, ans=0.025 2023-11-18 16:27:24,505 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10650, loss[loss=0.1254, simple_loss=0.1477, pruned_loss=0.04015, audio_tagging_loss=0.01144, over 15989.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1184, pruned_loss=0.03433, audio_tagging_loss=0.01174, over 3043353.87 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:27:40,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 9.767e+01 1.078e+02 1.173e+02 1.612e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 16:27:53,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=311586.6666666667, ans=0.0 2023-11-18 16:28:03,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=311653.3333333333, ans=0.1 2023-11-18 16:28:09,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=311720.0, ans=0.125 2023-11-18 16:28:20,378 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10700, loss[loss=0.1065, simple_loss=0.1252, pruned_loss=0.03259, audio_tagging_loss=0.01133, over 15750.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1186, pruned_loss=0.03412, audio_tagging_loss=0.01173, over 3041439.08 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:28:41,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=311920.0, ans=0.125 2023-11-18 16:28:45,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.84 vs. limit=10.0 2023-11-18 16:28:50,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=311920.0, ans=0.2 2023-11-18 16:29:11,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=312053.3333333333, ans=0.125 2023-11-18 16:29:17,077 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10750, loss[loss=0.1316, simple_loss=0.1479, pruned_loss=0.04929, audio_tagging_loss=0.008375, over 15478.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.12, pruned_loss=0.03466, audio_tagging_loss=0.01157, over 3047604.05 frames. ], batch size: 58, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:29:29,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-18 16:29:33,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 9.141e+01 9.911e+01 1.128e+02 1.714e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-18 16:29:36,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=312186.6666666667, ans=0.1 2023-11-18 16:29:36,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=312186.6666666667, ans=0.125 2023-11-18 16:29:44,182 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2023-11-18 16:29:47,203 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:30:12,484 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10800, loss[loss=0.1109, simple_loss=0.1297, pruned_loss=0.03711, audio_tagging_loss=0.008901, over 15639.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1193, pruned_loss=0.03447, audio_tagging_loss=0.01163, over 3048882.82 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:30:12,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=312453.3333333333, ans=0.0 2023-11-18 16:30:30,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=312520.0, ans=0.1 2023-11-18 16:30:32,997 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=312520.0, ans=0.2 2023-11-18 16:30:38,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=312586.6666666667, ans=0.125 2023-11-18 16:30:40,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=312586.6666666667, ans=0.0 2023-11-18 16:30:44,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=312586.6666666667, ans=0.2 2023-11-18 16:30:49,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=312653.3333333333, ans=0.125 2023-11-18 16:30:55,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=312653.3333333333, ans=0.1 2023-11-18 16:31:05,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=312720.0, ans=0.07 2023-11-18 16:31:08,848 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10850, loss[loss=0.1247, simple_loss=0.1349, pruned_loss=0.04749, audio_tagging_loss=0.009805, over 15006.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.12, pruned_loss=0.03483, audio_tagging_loss=0.01169, over 3047483.65 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:31:25,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 9.301e+01 1.024e+02 1.166e+02 1.801e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 16:31:27,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=312853.3333333333, ans=0.125 2023-11-18 16:31:42,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=312986.6666666667, ans=0.0 2023-11-18 16:31:49,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=312986.6666666667, ans=0.125 2023-11-18 16:31:54,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=313053.3333333333, ans=0.125 2023-11-18 16:32:03,382 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:32:03,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=313120.0, ans=0.125 2023-11-18 16:32:04,474 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10900, loss[loss=0.09443, simple_loss=0.09967, pruned_loss=0.03054, audio_tagging_loss=0.01405, over 14798.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1197, pruned_loss=0.0345, audio_tagging_loss=0.01176, over 3041105.20 frames. ], batch size: 56, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:32:12,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=15.0 2023-11-18 16:32:28,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=313253.3333333333, ans=0.125 2023-11-18 16:32:50,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=313386.6666666667, ans=0.2 2023-11-18 16:32:53,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-11-18 16:32:56,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=313386.6666666667, ans=15.0 2023-11-18 16:32:59,373 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 10950, loss[loss=0.08675, simple_loss=0.08854, pruned_loss=0.02888, audio_tagging_loss=0.0136, over 15271.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1207, pruned_loss=0.03469, audio_tagging_loss=0.01171, over 3040662.92 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:33:04,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=313453.3333333333, ans=0.2 2023-11-18 16:33:08,189 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:33:09,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313520.0, ans=0.1 2023-11-18 16:33:16,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 9.324e+01 1.025e+02 1.137e+02 1.491e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 16:33:24,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313586.6666666667, ans=0.1 2023-11-18 16:33:41,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=313653.3333333333, ans=0.125 2023-11-18 16:33:46,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=313720.0, ans=0.0 2023-11-18 16:33:50,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=313720.0, ans=0.125 2023-11-18 16:33:53,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=313786.6666666667, ans=0.125 2023-11-18 16:33:54,233 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.79 vs. limit=10.0 2023-11-18 16:33:54,814 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11000, loss[loss=0.08894, simple_loss=0.08394, pruned_loss=0.0283, audio_tagging_loss=0.01866, over 14572.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1214, pruned_loss=0.03468, audio_tagging_loss=0.0117, over 3039960.86 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 64.0 2023-11-18 16:33:57,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=313786.6666666667, ans=0.125 2023-11-18 16:34:00,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.69 vs. limit=10.0 2023-11-18 16:34:05,970 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:34:25,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=313920.0, ans=0.125 2023-11-18 16:34:30,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=313986.6666666667, ans=0.125 2023-11-18 16:34:50,185 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11050, loss[loss=0.1081, simple_loss=0.123, pruned_loss=0.03327, audio_tagging_loss=0.0133, over 15690.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1227, pruned_loss=0.03498, audio_tagging_loss=0.01178, over 3049153.53 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:35:00,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-11-18 16:35:03,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=314186.6666666667, ans=0.125 2023-11-18 16:35:06,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.121e+01 9.418e+01 1.036e+02 1.168e+02 1.751e+02, threshold=2.073e+02, percent-clipped=0.0 2023-11-18 16:35:26,047 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:35:30,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=314320.0, ans=0.125 2023-11-18 16:35:35,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314386.6666666667, ans=0.1 2023-11-18 16:35:41,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=314386.6666666667, ans=0.125 2023-11-18 16:35:41,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=314386.6666666667, ans=0.125 2023-11-18 16:35:45,672 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11100, loss[loss=0.1294, simple_loss=0.1371, pruned_loss=0.04708, audio_tagging_loss=0.01376, over 15239.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1238, pruned_loss=0.03553, audio_tagging_loss=0.01184, over 3048529.31 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:36:40,808 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11150, loss[loss=0.134, simple_loss=0.1655, pruned_loss=0.04272, audio_tagging_loss=0.008548, over 16528.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1221, pruned_loss=0.03535, audio_tagging_loss=0.01185, over 3050456.82 frames. ], batch size: 59, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:36:46,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=314786.6666666667, ans=0.05 2023-11-18 16:36:52,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314853.3333333333, ans=0.1 2023-11-18 16:36:54,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=314853.3333333333, ans=0.0 2023-11-18 16:36:58,979 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 9.570e+01 1.059e+02 1.181e+02 1.990e+02, threshold=2.118e+02, percent-clipped=0.0 2023-11-18 16:37:10,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=314920.0, ans=0.0 2023-11-18 16:37:14,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=314986.6666666667, ans=0.125 2023-11-18 16:37:17,504 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-18 16:37:31,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=315053.3333333333, ans=0.5 2023-11-18 16:37:32,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=315053.3333333333, ans=0.125 2023-11-18 16:37:33,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2023-11-18 16:37:37,281 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11200, loss[loss=0.1265, simple_loss=0.1446, pruned_loss=0.03901, audio_tagging_loss=0.01519, over 14828.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1218, pruned_loss=0.03516, audio_tagging_loss=0.01199, over 3053488.15 frames. ], batch size: 55, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:37:48,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=315186.6666666667, ans=0.035 2023-11-18 16:38:12,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=315320.0, ans=0.2 2023-11-18 16:38:17,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=315320.0, ans=0.125 2023-11-18 16:38:31,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=315453.3333333333, ans=0.0 2023-11-18 16:38:32,600 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11250, loss[loss=0.128, simple_loss=0.1438, pruned_loss=0.04677, audio_tagging_loss=0.009344, over 14493.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1209, pruned_loss=0.03505, audio_tagging_loss=0.01199, over 3052909.71 frames. ], batch size: 55, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:38:48,487 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 9.211e+01 1.045e+02 1.164e+02 1.761e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 16:38:48,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=315520.0, ans=0.1 2023-11-18 16:38:48,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=315520.0, ans=0.125 2023-11-18 16:38:55,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=315586.6666666667, ans=0.0 2023-11-18 16:39:12,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=315653.3333333333, ans=0.2 2023-11-18 16:39:27,251 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11300, loss[loss=0.06424, simple_loss=0.07239, pruned_loss=0.01763, audio_tagging_loss=0.01041, over 14425.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1207, pruned_loss=0.03505, audio_tagging_loss=0.01185, over 3061197.33 frames. ], batch size: 54, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:39:27,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=315786.6666666667, ans=0.125 2023-11-18 16:39:29,049 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.63 vs. limit=15.0 2023-11-18 16:39:45,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.89 vs. limit=22.5 2023-11-18 16:39:55,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=315920.0, ans=0.2 2023-11-18 16:40:03,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=315986.6666666667, ans=0.0 2023-11-18 16:40:08,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=15.0 2023-11-18 16:40:16,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=316053.3333333333, ans=0.125 2023-11-18 16:40:22,795 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11350, loss[loss=0.08973, simple_loss=0.09995, pruned_loss=0.02937, audio_tagging_loss=0.01039, over 14077.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1216, pruned_loss=0.03523, audio_tagging_loss=0.01161, over 3054802.74 frames. ], batch size: 54, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:40:25,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-18 16:40:26,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=316120.0, ans=0.02 2023-11-18 16:40:29,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-11-18 16:40:34,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2023-11-18 16:40:39,331 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.469e+01 1.052e+02 1.138e+02 1.718e+02, threshold=2.104e+02, percent-clipped=0.0 2023-11-18 16:40:43,092 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2023-11-18 16:41:00,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=316320.0, ans=0.125 2023-11-18 16:41:18,442 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11400, loss[loss=0.08601, simple_loss=0.09453, pruned_loss=0.02737, audio_tagging_loss=0.01137, over 15355.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1222, pruned_loss=0.03533, audio_tagging_loss=0.01148, over 3056456.74 frames. ], batch size: 58, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:41:34,783 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=12.0 2023-11-18 16:41:40,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=316586.6666666667, ans=0.2 2023-11-18 16:41:41,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316586.6666666667, ans=0.1 2023-11-18 16:41:45,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2023-11-18 16:41:51,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=316653.3333333333, ans=0.125 2023-11-18 16:42:03,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.01 vs. limit=10.0 2023-11-18 16:42:13,244 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11450, loss[loss=0.09111, simple_loss=0.1069, pruned_loss=0.02929, audio_tagging_loss=0.008365, over 14445.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1209, pruned_loss=0.03513, audio_tagging_loss=0.01151, over 3049966.49 frames. ], batch size: 53, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:42:17,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=316786.6666666667, ans=0.125 2023-11-18 16:42:21,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=316786.6666666667, ans=0.0 2023-11-18 16:42:30,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.176e+01 9.649e+01 1.077e+02 1.207e+02 1.681e+02, threshold=2.154e+02, percent-clipped=0.0 2023-11-18 16:42:35,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=316920.0, ans=0.125 2023-11-18 16:42:39,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316920.0, ans=0.1 2023-11-18 16:42:51,043 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=22.5 2023-11-18 16:43:02,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=317053.3333333333, ans=0.0 2023-11-18 16:43:08,881 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11500, loss[loss=0.1109, simple_loss=0.1288, pruned_loss=0.03275, audio_tagging_loss=0.01368, over 15101.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.121, pruned_loss=0.03506, audio_tagging_loss=0.01149, over 3052520.42 frames. ], batch size: 58, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:43:19,829 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=6.075e-02 2023-11-18 16:43:19,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=317186.6666666667, ans=0.125 2023-11-18 16:44:05,434 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11550, loss[loss=0.09841, simple_loss=0.1121, pruned_loss=0.0326, audio_tagging_loss=0.009736, over 15725.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1215, pruned_loss=0.0351, audio_tagging_loss=0.01142, over 3058500.53 frames. ], batch size: 62, lr: 1.55e-02, grad_scale: 16.0 2023-11-18 16:44:05,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317453.3333333333, ans=0.1 2023-11-18 16:44:07,928 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.57 vs. limit=10.0 2023-11-18 16:44:09,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=317453.3333333333, ans=0.1 2023-11-18 16:44:15,631 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-11-18 16:44:17,533 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2023-11-18 16:44:21,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=317520.0, ans=0.1 2023-11-18 16:44:23,382 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 9.289e+01 1.045e+02 1.175e+02 1.806e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 16:44:29,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=317586.6666666667, ans=0.0 2023-11-18 16:44:29,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317586.6666666667, ans=0.1 2023-11-18 16:44:34,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=317586.6666666667, ans=0.125 2023-11-18 16:44:41,081 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:45:00,898 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11600, loss[loss=0.1165, simple_loss=0.1301, pruned_loss=0.04222, audio_tagging_loss=0.009266, over 15409.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1211, pruned_loss=0.03516, audio_tagging_loss=0.01142, over 3047633.39 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:45:03,639 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-11-18 16:45:11,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=317853.3333333333, ans=0.0 2023-11-18 16:45:40,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=317986.6666666667, ans=0.2 2023-11-18 16:45:40,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317986.6666666667, ans=0.1 2023-11-18 16:45:47,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-18 16:45:50,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318053.3333333333, ans=0.1 2023-11-18 16:45:56,554 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11650, loss[loss=0.08496, simple_loss=0.09604, pruned_loss=0.02374, audio_tagging_loss=0.0132, over 15660.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.121, pruned_loss=0.03476, audio_tagging_loss=0.01145, over 3041952.52 frames. ], batch size: 59, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:45:57,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=318120.0, ans=0.125 2023-11-18 16:46:05,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=318120.0, ans=0.125 2023-11-18 16:46:08,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=318186.6666666667, ans=0.125 2023-11-18 16:46:15,785 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 9.455e+01 1.056e+02 1.163e+02 1.452e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 16:46:16,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.25 vs. limit=22.5 2023-11-18 16:46:25,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=318253.3333333333, ans=0.1 2023-11-18 16:46:39,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-11-18 16:46:44,096 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:46:51,860 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11700, loss[loss=0.09457, simple_loss=0.1008, pruned_loss=0.03378, audio_tagging_loss=0.01039, over 14793.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1202, pruned_loss=0.03446, audio_tagging_loss=0.01152, over 3043656.26 frames. ], batch size: 56, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:46:55,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=318453.3333333333, ans=0.1 2023-11-18 16:47:03,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=318520.0, ans=0.125 2023-11-18 16:47:11,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=318520.0, ans=0.0 2023-11-18 16:47:12,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2023-11-18 16:47:27,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=318653.3333333333, ans=0.125 2023-11-18 16:47:43,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=318720.0, ans=0.125 2023-11-18 16:47:47,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=318786.6666666667, ans=0.2 2023-11-18 16:47:47,745 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11750, loss[loss=0.1294, simple_loss=0.1576, pruned_loss=0.04194, audio_tagging_loss=0.008657, over 16109.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1216, pruned_loss=0.03514, audio_tagging_loss=0.0115, over 3044527.99 frames. ], batch size: 60, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:48:06,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.155e+01 9.788e+01 1.106e+02 1.226e+02 1.834e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 16:48:06,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=318853.3333333333, ans=0.125 2023-11-18 16:48:07,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=318853.3333333333, ans=0.0 2023-11-18 16:48:14,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=318920.0, ans=0.2 2023-11-18 16:48:33,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=319053.3333333333, ans=0.0 2023-11-18 16:48:33,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=319053.3333333333, ans=0.035 2023-11-18 16:48:43,597 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11800, loss[loss=0.1127, simple_loss=0.1295, pruned_loss=0.03795, audio_tagging_loss=0.009983, over 15521.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1204, pruned_loss=0.03482, audio_tagging_loss=0.01157, over 3046454.53 frames. ], batch size: 56, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:48:51,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=319120.0, ans=0.2 2023-11-18 16:49:30,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=319386.6666666667, ans=0.1 2023-11-18 16:49:39,792 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11850, loss[loss=0.07265, simple_loss=0.06429, pruned_loss=0.02152, audio_tagging_loss=0.01898, over 15251.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1202, pruned_loss=0.03459, audio_tagging_loss=0.01169, over 3042121.39 frames. ], batch size: 59, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:49:44,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319453.3333333333, ans=0.1 2023-11-18 16:49:45,252 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=319453.3333333333, ans=0.125 2023-11-18 16:49:52,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=319520.0, ans=0.0 2023-11-18 16:49:58,413 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.740e+01 1.079e+02 1.230e+02 2.254e+02, threshold=2.157e+02, percent-clipped=1.0 2023-11-18 16:49:59,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=319520.0, ans=0.125 2023-11-18 16:49:59,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=319520.0, ans=0.1 2023-11-18 16:50:02,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=319586.6666666667, ans=0.125 2023-11-18 16:50:10,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=319586.6666666667, ans=0.125 2023-11-18 16:50:25,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=319720.0, ans=0.125 2023-11-18 16:50:29,854 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:50:34,997 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11900, loss[loss=0.08733, simple_loss=0.09639, pruned_loss=0.02572, audio_tagging_loss=0.01342, over 15153.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1195, pruned_loss=0.03429, audio_tagging_loss=0.01182, over 3040634.26 frames. ], batch size: 57, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:50:42,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=319786.6666666667, ans=0.2 2023-11-18 16:50:45,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=319853.3333333333, ans=0.125 2023-11-18 16:50:45,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=319853.3333333333, ans=0.125 2023-11-18 16:51:02,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=319920.0, ans=0.0 2023-11-18 16:51:04,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=319920.0, ans=0.04949747468305833 2023-11-18 16:51:19,940 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:51:22,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=320053.3333333333, ans=22.5 2023-11-18 16:51:23,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=320053.3333333333, ans=0.125 2023-11-18 16:51:26,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=320053.3333333333, ans=0.125 2023-11-18 16:51:32,930 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 11950, loss[loss=0.08594, simple_loss=0.101, pruned_loss=0.02383, audio_tagging_loss=0.0116, over 16158.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1198, pruned_loss=0.03433, audio_tagging_loss=0.01183, over 3047616.99 frames. ], batch size: 62, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:51:52,478 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.985e+01 9.226e+01 1.013e+02 1.097e+02 1.681e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 16:52:11,303 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2023-11-18 16:52:27,180 INFO [train_asr.py:1115] (3/4) Epoch 4, batch 12000, loss[loss=0.1221, simple_loss=0.1485, pruned_loss=0.03775, audio_tagging_loss=0.01005, over 16724.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.119, pruned_loss=0.03386, audio_tagging_loss=0.01193, over 3041641.62 frames. ], batch size: 59, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:52:27,181 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 16:52:40,541 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5074, 2.7862, 3.2872, 3.1289], device='cuda:3') 2023-11-18 16:52:43,169 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1371, 2.5615, 3.9891, 2.9280], device='cuda:3') 2023-11-18 16:53:00,109 INFO [train_asr.py:1147] (3/4) Epoch 4, validation: loss=0.07553, simple_loss=0.06151, pruned_loss=0.009833, audio_tagging_loss=0.03495, over 4681554.00 frames. 2023-11-18 16:53:00,109 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 16:53:06,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-18 16:53:07,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=320453.3333333333, ans=0.125 2023-11-18 16:53:11,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=320520.0, ans=0.0 2023-11-18 16:53:13,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=320520.0, ans=0.0 2023-11-18 16:54:03,925 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 0, loss[loss=0.1052, simple_loss=0.1055, pruned_loss=0.02858, audio_tagging_loss=0.02387, over 15011.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1055, pruned_loss=0.02858, audio_tagging_loss=0.02387, over 15011.00 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:54:03,926 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 16:54:35,508 INFO [train_asr.py:1147] (3/4) Epoch 5, validation: loss=0.07399, simple_loss=0.06162, pruned_loss=0.009934, audio_tagging_loss=0.03325, over 4681554.00 frames. 2023-11-18 16:54:35,509 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 16:54:43,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=320626.6666666667, ans=0.1 2023-11-18 16:54:47,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2023-11-18 16:54:55,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=320693.3333333333, ans=0.035 2023-11-18 16:55:17,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.97 vs. limit=6.0 2023-11-18 16:55:21,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 9.535e+01 1.056e+02 1.198e+02 1.542e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 16:55:31,306 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 50, loss[loss=0.1089, simple_loss=0.113, pruned_loss=0.03176, audio_tagging_loss=0.02065, over 14336.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1211, pruned_loss=0.03542, audio_tagging_loss=0.02199, over 682992.12 frames. ], batch size: 54, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:55:39,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=320960.0, ans=0.125 2023-11-18 16:55:41,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.85 vs. limit=15.0 2023-11-18 16:55:57,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=321093.3333333333, ans=0.0 2023-11-18 16:55:59,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-18 16:56:24,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=321226.6666666667, ans=0.125 2023-11-18 16:56:26,663 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 100, loss[loss=0.07385, simple_loss=0.07213, pruned_loss=0.01716, audio_tagging_loss=0.02062, over 15917.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1182, pruned_loss=0.03378, audio_tagging_loss=0.02174, over 1210261.74 frames. ], batch size: 62, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:56:30,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=321293.3333333333, ans=0.125 2023-11-18 16:56:37,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=321360.0, ans=0.125 2023-11-18 16:56:51,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321426.6666666667, ans=0.1 2023-11-18 16:56:52,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321426.6666666667, ans=0.1 2023-11-18 16:57:12,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 9.566e+01 1.064e+02 1.154e+02 1.620e+02, threshold=2.127e+02, percent-clipped=0.0 2023-11-18 16:57:12,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=321560.0, ans=0.125 2023-11-18 16:57:22,393 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 150, loss[loss=0.09879, simple_loss=0.1107, pruned_loss=0.03122, audio_tagging_loss=0.01221, over 15879.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1188, pruned_loss=0.03394, audio_tagging_loss=0.0194, over 1614638.38 frames. ], batch size: 61, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:57:23,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.66 vs. limit=10.0 2023-11-18 16:57:23,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=321626.6666666667, ans=0.0 2023-11-18 16:57:27,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=321626.6666666667, ans=0.125 2023-11-18 16:57:43,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=321760.0, ans=0.1 2023-11-18 16:57:46,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=321760.0, ans=0.07 2023-11-18 16:58:02,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=321826.6666666667, ans=0.0 2023-11-18 16:58:17,755 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 200, loss[loss=0.1349, simple_loss=0.1572, pruned_loss=0.04343, audio_tagging_loss=0.0128, over 15203.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1198, pruned_loss=0.03437, audio_tagging_loss=0.01703, over 1926703.84 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:58:22,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=321960.0, ans=0.125 2023-11-18 16:59:03,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 9.231e+01 1.044e+02 1.147e+02 1.591e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 16:59:14,445 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 250, loss[loss=0.09441, simple_loss=0.1149, pruned_loss=0.02714, audio_tagging_loss=0.009817, over 16391.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1207, pruned_loss=0.0345, audio_tagging_loss=0.01524, over 2179307.36 frames. ], batch size: 61, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:59:25,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=22.5 2023-11-18 16:59:27,984 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=12.0 2023-11-18 16:59:30,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=322360.0, ans=0.0 2023-11-18 16:59:42,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=322426.6666666667, ans=0.125 2023-11-18 16:59:44,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=322426.6666666667, ans=0.0 2023-11-18 16:59:47,542 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.419e-01 2023-11-18 16:59:56,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2023-11-18 17:00:09,719 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 300, loss[loss=0.09539, simple_loss=0.1128, pruned_loss=0.02933, audio_tagging_loss=0.009672, over 16075.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1201, pruned_loss=0.0343, audio_tagging_loss=0.01406, over 2369531.35 frames. ], batch size: 60, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:00:12,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=322626.6666666667, ans=0.125 2023-11-18 17:00:21,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-18 17:00:35,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.99 vs. limit=10.0 2023-11-18 17:00:43,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.35 vs. limit=15.0 2023-11-18 17:00:49,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=322826.6666666667, ans=0.07 2023-11-18 17:00:56,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 9.144e+01 1.032e+02 1.177e+02 1.892e+02, threshold=2.064e+02, percent-clipped=0.0 2023-11-18 17:01:02,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322893.3333333333, ans=0.1 2023-11-18 17:01:06,801 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 350, loss[loss=0.1388, simple_loss=0.1615, pruned_loss=0.04955, audio_tagging_loss=0.008436, over 15596.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1195, pruned_loss=0.03398, audio_tagging_loss=0.01333, over 2523310.47 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:01:07,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=322960.0, ans=0.125 2023-11-18 17:01:10,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=322960.0, ans=0.0 2023-11-18 17:01:24,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.59 vs. limit=10.0 2023-11-18 17:01:26,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.59 vs. limit=15.0 2023-11-18 17:01:27,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=323026.6666666667, ans=0.125 2023-11-18 17:01:42,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2023-11-18 17:02:03,604 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 400, loss[loss=0.1013, simple_loss=0.1215, pruned_loss=0.03166, audio_tagging_loss=0.008928, over 14487.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1201, pruned_loss=0.03422, audio_tagging_loss=0.0128, over 2638578.76 frames. ], batch size: 54, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:02:04,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=323293.3333333333, ans=0.0 2023-11-18 17:02:11,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=323293.3333333333, ans=0.125 2023-11-18 17:02:35,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=323493.3333333333, ans=0.0 2023-11-18 17:02:49,144 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.956e+01 1.111e+02 1.271e+02 1.658e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 17:02:51,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=323560.0, ans=22.5 2023-11-18 17:02:58,840 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 450, loss[loss=0.1447, simple_loss=0.1661, pruned_loss=0.0531, audio_tagging_loss=0.008589, over 15242.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1188, pruned_loss=0.03389, audio_tagging_loss=0.01242, over 2727454.51 frames. ], batch size: 53, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:03:01,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2023-11-18 17:03:36,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=323826.6666666667, ans=0.0 2023-11-18 17:03:39,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=323826.6666666667, ans=0.0 2023-11-18 17:03:48,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-18 17:03:49,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=323893.3333333333, ans=0.125 2023-11-18 17:03:54,618 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 500, loss[loss=0.1331, simple_loss=0.1608, pruned_loss=0.04424, audio_tagging_loss=0.008487, over 15905.00 frames. ], tot_loss[loss=0.105, simple_loss=0.1181, pruned_loss=0.03378, audio_tagging_loss=0.01223, over 2794111.29 frames. ], batch size: 57, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:03:56,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=323960.0, ans=0.125 2023-11-18 17:03:57,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=323960.0, ans=0.0 2023-11-18 17:04:14,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=324026.6666666667, ans=0.04949747468305833 2023-11-18 17:04:24,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324093.3333333333, ans=0.1 2023-11-18 17:04:29,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=324160.0, ans=0.1 2023-11-18 17:04:32,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=324160.0, ans=0.125 2023-11-18 17:04:40,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 9.068e+01 9.771e+01 1.090e+02 1.763e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 17:04:51,728 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 550, loss[loss=0.1152, simple_loss=0.1368, pruned_loss=0.03958, audio_tagging_loss=0.007176, over 15432.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1191, pruned_loss=0.03391, audio_tagging_loss=0.01195, over 2845640.45 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:05:42,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=324560.0, ans=0.125 2023-11-18 17:05:44,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324560.0, ans=0.1 2023-11-18 17:05:46,822 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 600, loss[loss=0.1038, simple_loss=0.1156, pruned_loss=0.03222, audio_tagging_loss=0.01381, over 14866.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1178, pruned_loss=0.03358, audio_tagging_loss=0.01192, over 2890806.61 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:05:55,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=324626.6666666667, ans=0.125 2023-11-18 17:06:12,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=324760.0, ans=0.0 2023-11-18 17:06:15,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324760.0, ans=0.1 2023-11-18 17:06:15,623 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:06:18,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324760.0, ans=0.1 2023-11-18 17:06:32,202 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 9.095e+01 1.023e+02 1.155e+02 1.808e+02, threshold=2.046e+02, percent-clipped=0.0 2023-11-18 17:06:38,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2023-11-18 17:06:42,441 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 650, loss[loss=0.1083, simple_loss=0.1317, pruned_loss=0.03358, audio_tagging_loss=0.008852, over 15239.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1186, pruned_loss=0.03391, audio_tagging_loss=0.01188, over 2930831.80 frames. ], batch size: 54, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:06:52,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=325026.6666666667, ans=0.125 2023-11-18 17:07:00,735 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.249e-03 2023-11-18 17:07:01,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=325026.6666666667, ans=0.0 2023-11-18 17:07:09,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=325093.3333333333, ans=0.1 2023-11-18 17:07:13,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=325093.3333333333, ans=0.0 2023-11-18 17:07:37,857 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 700, loss[loss=0.11, simple_loss=0.1201, pruned_loss=0.03516, audio_tagging_loss=0.0148, over 15457.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1195, pruned_loss=0.03379, audio_tagging_loss=0.01174, over 2956844.62 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:07:53,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=325360.0, ans=0.125 2023-11-18 17:08:05,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=325426.6666666667, ans=0.125 2023-11-18 17:08:06,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=325426.6666666667, ans=0.125 2023-11-18 17:08:16,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2023-11-18 17:08:24,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 9.264e+01 9.972e+01 1.138e+02 1.580e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-18 17:08:27,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=325560.0, ans=0.2 2023-11-18 17:08:33,994 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 750, loss[loss=0.08305, simple_loss=0.09686, pruned_loss=0.02062, audio_tagging_loss=0.014, over 14102.00 frames. ], tot_loss[loss=0.1048, simple_loss=0.1189, pruned_loss=0.03364, audio_tagging_loss=0.01176, over 2972925.98 frames. ], batch size: 54, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:08:50,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=325693.3333333333, ans=0.2 2023-11-18 17:08:54,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=325760.0, ans=0.0 2023-11-18 17:08:58,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2023-11-18 17:09:07,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=325826.6666666667, ans=0.125 2023-11-18 17:09:29,754 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 800, loss[loss=0.1073, simple_loss=0.1149, pruned_loss=0.03652, audio_tagging_loss=0.01332, over 14621.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1198, pruned_loss=0.034, audio_tagging_loss=0.01178, over 2989502.66 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:09:30,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=325960.0, ans=0.125 2023-11-18 17:09:39,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-11-18 17:09:57,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=15.0 2023-11-18 17:10:08,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=326160.0, ans=10.0 2023-11-18 17:10:15,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 9.696e+01 1.116e+02 1.261e+02 1.745e+02, threshold=2.231e+02, percent-clipped=0.0 2023-11-18 17:10:24,962 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 850, loss[loss=0.07907, simple_loss=0.07994, pruned_loss=0.02366, audio_tagging_loss=0.01544, over 15330.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1198, pruned_loss=0.03407, audio_tagging_loss=0.01182, over 2999632.19 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:10:26,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=326293.3333333333, ans=0.04949747468305833 2023-11-18 17:10:29,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.33 vs. limit=10.0 2023-11-18 17:11:15,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=326560.0, ans=0.125 2023-11-18 17:11:20,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=326626.6666666667, ans=0.0 2023-11-18 17:11:21,920 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 900, loss[loss=0.1105, simple_loss=0.1333, pruned_loss=0.03432, audio_tagging_loss=0.009523, over 15990.00 frames. ], tot_loss[loss=0.106, simple_loss=0.12, pruned_loss=0.03414, audio_tagging_loss=0.01188, over 3011390.43 frames. ], batch size: 58, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:11:26,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=326626.6666666667, ans=10.0 2023-11-18 17:11:27,404 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:11:28,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=326626.6666666667, ans=0.0 2023-11-18 17:11:33,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=326693.3333333333, ans=0.125 2023-11-18 17:11:50,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=326760.0, ans=0.1 2023-11-18 17:11:56,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=326826.6666666667, ans=0.2 2023-11-18 17:12:08,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.412e+01 1.033e+02 1.138e+02 1.840e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 17:12:17,651 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 950, loss[loss=0.08868, simple_loss=0.1032, pruned_loss=0.02686, audio_tagging_loss=0.01019, over 15718.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1195, pruned_loss=0.03392, audio_tagging_loss=0.01178, over 3022418.17 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:12:38,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=327026.6666666667, ans=0.125 2023-11-18 17:12:54,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=327160.0, ans=0.125 2023-11-18 17:13:06,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=327226.6666666667, ans=0.125 2023-11-18 17:13:06,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=327226.6666666667, ans=0.95 2023-11-18 17:13:13,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.17 vs. limit=10.0 2023-11-18 17:13:13,621 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1000, loss[loss=0.09858, simple_loss=0.1111, pruned_loss=0.03359, audio_tagging_loss=0.009415, over 14935.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1186, pruned_loss=0.03363, audio_tagging_loss=0.01156, over 3021628.12 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:13:14,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=327293.3333333333, ans=0.0 2023-11-18 17:13:18,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=327293.3333333333, ans=0.125 2023-11-18 17:13:20,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=327293.3333333333, ans=0.125 2023-11-18 17:13:21,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.94 vs. limit=22.5 2023-11-18 17:13:32,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=327360.0, ans=0.125 2023-11-18 17:13:38,758 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:13:46,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=327426.6666666667, ans=0.125 2023-11-18 17:13:56,033 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=12.0 2023-11-18 17:14:00,045 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 9.245e+01 1.040e+02 1.129e+02 1.708e+02, threshold=2.081e+02, percent-clipped=0.0 2023-11-18 17:14:10,359 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1050, loss[loss=0.08907, simple_loss=0.08899, pruned_loss=0.02603, audio_tagging_loss=0.01854, over 13899.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.117, pruned_loss=0.03322, audio_tagging_loss=0.0115, over 3022593.81 frames. ], batch size: 53, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:14:21,122 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2023-11-18 17:14:21,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=327693.3333333333, ans=0.125 2023-11-18 17:14:25,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=327693.3333333333, ans=0.0 2023-11-18 17:14:27,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2023-11-18 17:14:44,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=327826.6666666667, ans=0.2 2023-11-18 17:14:56,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=327893.3333333333, ans=0.0 2023-11-18 17:15:02,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=327893.3333333333, ans=0.125 2023-11-18 17:15:06,212 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1100, loss[loss=0.119, simple_loss=0.1345, pruned_loss=0.03935, audio_tagging_loss=0.01241, over 14661.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1174, pruned_loss=0.03328, audio_tagging_loss=0.01148, over 3024344.13 frames. ], batch size: 55, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:15:09,917 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:15:15,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327960.0, ans=0.1 2023-11-18 17:15:32,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=328093.3333333333, ans=10.0 2023-11-18 17:15:52,625 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.982e+01 9.877e+01 1.122e+02 1.591e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-18 17:15:54,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328226.6666666667, ans=0.1 2023-11-18 17:15:59,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=328226.6666666667, ans=0.0 2023-11-18 17:16:02,178 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1150, loss[loss=0.1199, simple_loss=0.1344, pruned_loss=0.04317, audio_tagging_loss=0.009502, over 15574.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1173, pruned_loss=0.03318, audio_tagging_loss=0.01144, over 3023848.72 frames. ], batch size: 58, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:16:02,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.44 vs. limit=6.0 2023-11-18 17:16:10,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328293.3333333333, ans=0.1 2023-11-18 17:16:17,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=328360.0, ans=0.2 2023-11-18 17:16:17,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=328360.0, ans=0.2 2023-11-18 17:16:34,526 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:16:51,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=328560.0, ans=0.125 2023-11-18 17:16:51,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-18 17:16:53,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=328560.0, ans=0.0 2023-11-18 17:16:58,662 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1200, loss[loss=0.09971, simple_loss=0.1184, pruned_loss=0.02806, audio_tagging_loss=0.01243, over 14877.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.117, pruned_loss=0.03285, audio_tagging_loss=0.01139, over 3031915.56 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:17:07,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=328626.6666666667, ans=0.125 2023-11-18 17:17:14,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=328693.3333333333, ans=0.0 2023-11-18 17:17:19,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=328760.0, ans=0.2 2023-11-18 17:17:20,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.39 vs. limit=15.0 2023-11-18 17:17:38,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.12 vs. limit=10.0 2023-11-18 17:17:40,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=328826.6666666667, ans=0.2 2023-11-18 17:17:44,329 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.512e+01 1.070e+02 1.237e+02 2.001e+02, threshold=2.140e+02, percent-clipped=1.0 2023-11-18 17:17:47,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=328893.3333333333, ans=0.125 2023-11-18 17:17:54,559 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1250, loss[loss=0.08924, simple_loss=0.09977, pruned_loss=0.02827, audio_tagging_loss=0.01109, over 14767.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1175, pruned_loss=0.03316, audio_tagging_loss=0.01134, over 3031323.29 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:18:15,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=329026.6666666667, ans=0.125 2023-11-18 17:18:16,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=329093.3333333333, ans=0.125 2023-11-18 17:18:32,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=329160.0, ans=0.125 2023-11-18 17:18:35,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.31 vs. limit=22.5 2023-11-18 17:18:35,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=329160.0, ans=0.0 2023-11-18 17:18:41,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.49 vs. limit=15.0 2023-11-18 17:18:50,347 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1300, loss[loss=0.139, simple_loss=0.1649, pruned_loss=0.04641, audio_tagging_loss=0.01017, over 14975.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1177, pruned_loss=0.0333, audio_tagging_loss=0.01133, over 3035103.41 frames. ], batch size: 54, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:19:08,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=329360.0, ans=0.125 2023-11-18 17:19:31,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=329493.3333333333, ans=0.035 2023-11-18 17:19:36,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.287e+01 1.028e+02 1.154e+02 1.858e+02, threshold=2.056e+02, percent-clipped=0.0 2023-11-18 17:19:36,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=329560.0, ans=0.02 2023-11-18 17:19:38,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=329560.0, ans=0.125 2023-11-18 17:19:44,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2023-11-18 17:19:46,961 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1350, loss[loss=0.1255, simple_loss=0.1476, pruned_loss=0.03981, audio_tagging_loss=0.01195, over 16414.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1185, pruned_loss=0.03353, audio_tagging_loss=0.01134, over 3042292.12 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:20:08,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=329760.0, ans=0.125 2023-11-18 17:20:26,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=329826.6666666667, ans=0.0 2023-11-18 17:20:28,580 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:20:29,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=329826.6666666667, ans=0.2 2023-11-18 17:20:37,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=329893.3333333333, ans=0.0 2023-11-18 17:20:42,466 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1400, loss[loss=0.08286, simple_loss=0.09324, pruned_loss=0.02569, audio_tagging_loss=0.01055, over 15301.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.119, pruned_loss=0.03358, audio_tagging_loss=0.01143, over 3047848.94 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:20:45,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=15.0 2023-11-18 17:20:55,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330026.6666666667, ans=0.1 2023-11-18 17:21:21,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=330160.0, ans=0.0 2023-11-18 17:21:27,665 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 9.034e+01 1.046e+02 1.162e+02 2.116e+02, threshold=2.092e+02, percent-clipped=1.0 2023-11-18 17:21:28,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330226.6666666667, ans=0.1 2023-11-18 17:21:38,685 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1450, loss[loss=0.1063, simple_loss=0.1248, pruned_loss=0.03134, audio_tagging_loss=0.01259, over 15140.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1193, pruned_loss=0.0335, audio_tagging_loss=0.01142, over 3042700.94 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:21:53,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=330360.0, ans=0.0 2023-11-18 17:22:13,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.20 vs. limit=22.5 2023-11-18 17:22:19,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=330493.3333333333, ans=0.0 2023-11-18 17:22:22,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330560.0, ans=0.1 2023-11-18 17:22:34,986 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1500, loss[loss=0.1086, simple_loss=0.1219, pruned_loss=0.03212, audio_tagging_loss=0.01557, over 16001.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1201, pruned_loss=0.03391, audio_tagging_loss=0.01151, over 3049267.91 frames. ], batch size: 63, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:22:41,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=330626.6666666667, ans=0.125 2023-11-18 17:22:48,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2023-11-18 17:23:04,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2023-11-18 17:23:11,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=330826.6666666667, ans=0.125 2023-11-18 17:23:14,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=330826.6666666667, ans=0.125 2023-11-18 17:23:20,981 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.325e+01 1.048e+02 1.222e+02 1.829e+02, threshold=2.095e+02, percent-clipped=0.0 2023-11-18 17:23:30,604 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1550, loss[loss=0.1271, simple_loss=0.145, pruned_loss=0.04458, audio_tagging_loss=0.009996, over 15367.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1208, pruned_loss=0.03405, audio_tagging_loss=0.01161, over 3054237.27 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:24:01,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=331093.3333333333, ans=0.125 2023-11-18 17:24:14,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2023-11-18 17:24:15,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331226.6666666667, ans=0.1 2023-11-18 17:24:16,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=331226.6666666667, ans=0.125 2023-11-18 17:24:26,282 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1600, loss[loss=0.1162, simple_loss=0.1266, pruned_loss=0.03979, audio_tagging_loss=0.0131, over 14628.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1199, pruned_loss=0.03371, audio_tagging_loss=0.01165, over 3048916.40 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:24:26,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=331293.3333333333, ans=0.125 2023-11-18 17:24:33,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=331293.3333333333, ans=0.125 2023-11-18 17:24:41,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.96 vs. limit=22.5 2023-11-18 17:24:44,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=331360.0, ans=0.125 2023-11-18 17:25:00,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2023-11-18 17:25:07,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331493.3333333333, ans=0.125 2023-11-18 17:25:12,768 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 9.229e+01 1.020e+02 1.137e+02 1.712e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 17:25:21,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=331626.6666666667, ans=0.035 2023-11-18 17:25:22,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.85 vs. limit=15.0 2023-11-18 17:25:23,399 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1650, loss[loss=0.07697, simple_loss=0.08164, pruned_loss=0.02479, audio_tagging_loss=0.01136, over 14466.00 frames. ], tot_loss[loss=0.1043, simple_loss=0.1184, pruned_loss=0.0333, audio_tagging_loss=0.01181, over 3046890.19 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:25:32,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331626.6666666667, ans=0.1 2023-11-18 17:25:40,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=331693.3333333333, ans=0.125 2023-11-18 17:25:47,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=331760.0, ans=0.125 2023-11-18 17:26:05,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=331826.6666666667, ans=0.2 2023-11-18 17:26:18,875 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1700, loss[loss=0.1138, simple_loss=0.1277, pruned_loss=0.04117, audio_tagging_loss=0.00873, over 15658.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1182, pruned_loss=0.03303, audio_tagging_loss=0.01183, over 3053478.99 frames. ], batch size: 57, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:26:20,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=331960.0, ans=0.2 2023-11-18 17:26:25,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=331960.0, ans=0.125 2023-11-18 17:26:31,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.67 vs. limit=22.5 2023-11-18 17:26:39,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=332026.6666666667, ans=0.125 2023-11-18 17:27:04,275 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:27:06,290 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.403e+01 1.012e+02 1.120e+02 1.645e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 17:27:13,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=332226.6666666667, ans=0.95 2023-11-18 17:27:13,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=332226.6666666667, ans=0.05 2023-11-18 17:27:13,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2023-11-18 17:27:14,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=332293.3333333333, ans=0.125 2023-11-18 17:27:15,292 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1750, loss[loss=0.08704, simple_loss=0.1048, pruned_loss=0.02098, audio_tagging_loss=0.01365, over 15880.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1175, pruned_loss=0.03256, audio_tagging_loss=0.01168, over 3061342.16 frames. ], batch size: 59, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:27:24,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=332293.3333333333, ans=0.125 2023-11-18 17:27:29,277 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:27:36,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332360.0, ans=0.1 2023-11-18 17:27:38,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=332426.6666666667, ans=0.125 2023-11-18 17:27:47,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=332426.6666666667, ans=0.05 2023-11-18 17:27:52,326 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:28:04,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.06 vs. limit=10.0 2023-11-18 17:28:10,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=332626.6666666667, ans=0.125 2023-11-18 17:28:11,839 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1800, loss[loss=0.1039, simple_loss=0.1191, pruned_loss=0.0337, audio_tagging_loss=0.01067, over 13993.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1181, pruned_loss=0.03286, audio_tagging_loss=0.01156, over 3063583.98 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:28:14,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.07 vs. limit=15.0 2023-11-18 17:28:31,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2023-11-18 17:28:33,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=332760.0, ans=0.125 2023-11-18 17:28:41,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332760.0, ans=0.1 2023-11-18 17:28:47,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=332826.6666666667, ans=0.0 2023-11-18 17:28:59,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 9.003e+01 9.997e+01 1.070e+02 1.437e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 17:29:03,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=332893.3333333333, ans=0.0 2023-11-18 17:29:07,652 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1850, loss[loss=0.1377, simple_loss=0.1661, pruned_loss=0.04814, audio_tagging_loss=0.006504, over 15829.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1184, pruned_loss=0.03305, audio_tagging_loss=0.01155, over 3058220.22 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:29:25,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=333026.6666666667, ans=0.0 2023-11-18 17:29:31,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=333093.3333333333, ans=0.125 2023-11-18 17:29:33,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=333093.3333333333, ans=0.125 2023-11-18 17:30:04,071 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1900, loss[loss=0.1459, simple_loss=0.1735, pruned_loss=0.05094, audio_tagging_loss=0.008207, over 15256.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1191, pruned_loss=0.03328, audio_tagging_loss=0.01142, over 3057123.85 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:30:08,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333293.3333333333, ans=0.1 2023-11-18 17:30:24,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=333360.0, ans=0.125 2023-11-18 17:30:29,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=333426.6666666667, ans=0.2 2023-11-18 17:30:38,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=333493.3333333333, ans=0.2 2023-11-18 17:30:42,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=333493.3333333333, ans=0.1 2023-11-18 17:30:45,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=333493.3333333333, ans=0.2 2023-11-18 17:30:50,674 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.811e+01 9.778e+01 1.068e+02 1.411e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 17:30:59,231 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 1950, loss[loss=0.1237, simple_loss=0.1303, pruned_loss=0.04606, audio_tagging_loss=0.01246, over 15479.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.119, pruned_loss=0.03334, audio_tagging_loss=0.01151, over 3058279.39 frames. ], batch size: 57, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:31:04,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.10 vs. limit=15.0 2023-11-18 17:31:18,034 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:31:21,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=333760.0, ans=0.2 2023-11-18 17:31:33,669 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=15.0 2023-11-18 17:31:41,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=333826.6666666667, ans=0.125 2023-11-18 17:31:47,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=333893.3333333333, ans=0.0 2023-11-18 17:31:48,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.86 vs. limit=15.0 2023-11-18 17:31:56,045 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2000, loss[loss=0.09164, simple_loss=0.0975, pruned_loss=0.02985, audio_tagging_loss=0.01304, over 15022.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1181, pruned_loss=0.0331, audio_tagging_loss=0.01157, over 3054928.25 frames. ], batch size: 59, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:32:04,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=333960.0, ans=0.2 2023-11-18 17:32:07,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=334026.6666666667, ans=0.95 2023-11-18 17:32:32,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-18 17:32:35,943 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-11-18 17:32:37,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=12.0 2023-11-18 17:32:42,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 9.028e+01 9.681e+01 1.107e+02 1.913e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-18 17:32:51,953 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2050, loss[loss=0.1259, simple_loss=0.1501, pruned_loss=0.04204, audio_tagging_loss=0.008768, over 15473.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1181, pruned_loss=0.03297, audio_tagging_loss=0.01146, over 3050292.43 frames. ], batch size: 56, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:32:52,088 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:32:52,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=334293.3333333333, ans=0.1 2023-11-18 17:32:59,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=334293.3333333333, ans=0.125 2023-11-18 17:33:10,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=334360.0, ans=0.125 2023-11-18 17:33:11,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=334360.0, ans=0.125 2023-11-18 17:33:13,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.09 vs. limit=22.5 2023-11-18 17:33:47,586 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2100, loss[loss=0.08784, simple_loss=0.1101, pruned_loss=0.02179, audio_tagging_loss=0.01099, over 15441.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1171, pruned_loss=0.03251, audio_tagging_loss=0.01138, over 3044913.63 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:33:54,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=334626.6666666667, ans=0.125 2023-11-18 17:34:19,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334760.0, ans=0.1 2023-11-18 17:34:27,306 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2023-11-18 17:34:28,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=334826.6666666667, ans=0.2 2023-11-18 17:34:30,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=334826.6666666667, ans=0.125 2023-11-18 17:34:34,905 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 9.682e+01 1.081e+02 1.226e+02 1.656e+02, threshold=2.162e+02, percent-clipped=0.0 2023-11-18 17:34:38,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334893.3333333333, ans=0.1 2023-11-18 17:34:39,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=334893.3333333333, ans=0.125 2023-11-18 17:34:43,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=334960.0, ans=0.5 2023-11-18 17:34:44,636 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2150, loss[loss=0.1059, simple_loss=0.1168, pruned_loss=0.0364, audio_tagging_loss=0.0111, over 15949.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1169, pruned_loss=0.03258, audio_tagging_loss=0.0114, over 3043190.59 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:34:51,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=334960.0, ans=0.0 2023-11-18 17:35:12,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=335093.3333333333, ans=0.0 2023-11-18 17:35:12,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=335093.3333333333, ans=0.125 2023-11-18 17:35:17,636 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:35:23,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2023-11-18 17:35:25,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=335160.0, ans=0.125 2023-11-18 17:35:33,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=335226.6666666667, ans=0.2 2023-11-18 17:35:39,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.45 vs. limit=15.0 2023-11-18 17:35:39,926 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2200, loss[loss=0.1097, simple_loss=0.1287, pruned_loss=0.03536, audio_tagging_loss=0.009995, over 15803.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1172, pruned_loss=0.03269, audio_tagging_loss=0.01142, over 3042569.11 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:35:40,393 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-18 17:35:41,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=335293.3333333333, ans=0.125 2023-11-18 17:35:45,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=335293.3333333333, ans=0.125 2023-11-18 17:35:57,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=335360.0, ans=0.125 2023-11-18 17:35:57,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=335360.0, ans=0.125 2023-11-18 17:36:21,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=335493.3333333333, ans=15.0 2023-11-18 17:36:21,295 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.35 vs. limit=15.0 2023-11-18 17:36:27,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.184e+01 9.433e+01 1.069e+02 1.154e+02 1.802e+02, threshold=2.138e+02, percent-clipped=0.0 2023-11-18 17:36:27,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=335560.0, ans=0.125 2023-11-18 17:36:32,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=335560.0, ans=0.125 2023-11-18 17:36:36,106 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2250, loss[loss=0.08901, simple_loss=0.1046, pruned_loss=0.02484, audio_tagging_loss=0.01185, over 14690.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1172, pruned_loss=0.03267, audio_tagging_loss=0.0114, over 3040508.49 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:36:36,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2023-11-18 17:37:08,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=335760.0, ans=0.1 2023-11-18 17:37:09,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=335826.6666666667, ans=0.125 2023-11-18 17:37:14,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=335826.6666666667, ans=0.0 2023-11-18 17:37:18,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=335826.6666666667, ans=0.125 2023-11-18 17:37:19,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=335826.6666666667, ans=0.125 2023-11-18 17:37:33,140 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2300, loss[loss=0.1339, simple_loss=0.1593, pruned_loss=0.04331, audio_tagging_loss=0.01092, over 15251.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1172, pruned_loss=0.03266, audio_tagging_loss=0.01149, over 3040843.93 frames. ], batch size: 55, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:37:35,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=335960.0, ans=0.125 2023-11-18 17:37:55,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=336093.3333333333, ans=0.1 2023-11-18 17:38:01,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=336093.3333333333, ans=0.0 2023-11-18 17:38:06,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=336160.0, ans=0.125 2023-11-18 17:38:09,620 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:38:20,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 9.497e+01 1.027e+02 1.187e+02 1.652e+02, threshold=2.054e+02, percent-clipped=0.0 2023-11-18 17:38:22,767 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:38:29,673 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2350, loss[loss=0.1165, simple_loss=0.142, pruned_loss=0.03572, audio_tagging_loss=0.009781, over 15654.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1177, pruned_loss=0.03276, audio_tagging_loss=0.01155, over 3048963.65 frames. ], batch size: 60, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:38:43,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2023-11-18 17:38:56,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=336426.6666666667, ans=0.07 2023-11-18 17:39:15,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=336560.0, ans=0.125 2023-11-18 17:39:25,512 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2400, loss[loss=0.08185, simple_loss=0.08909, pruned_loss=0.02384, audio_tagging_loss=0.01347, over 14760.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1168, pruned_loss=0.03246, audio_tagging_loss=0.01162, over 3051346.60 frames. ], batch size: 56, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:39:27,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=336626.6666666667, ans=0.125 2023-11-18 17:39:29,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=336626.6666666667, ans=0.1 2023-11-18 17:39:29,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=336626.6666666667, ans=0.125 2023-11-18 17:39:55,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-18 17:40:13,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 9.087e+01 9.616e+01 1.102e+02 1.303e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-18 17:40:19,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2023-11-18 17:40:21,848 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2450, loss[loss=0.08595, simple_loss=0.1054, pruned_loss=0.02327, audio_tagging_loss=0.00996, over 15462.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1184, pruned_loss=0.03292, audio_tagging_loss=0.01171, over 3055755.79 frames. ], batch size: 58, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:41:02,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=337160.0, ans=0.125 2023-11-18 17:41:17,247 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2500, loss[loss=0.09595, simple_loss=0.0915, pruned_loss=0.0362, audio_tagging_loss=0.014, over 14491.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1188, pruned_loss=0.03295, audio_tagging_loss=0.01175, over 3051371.09 frames. ], batch size: 56, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:41:17,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337293.3333333333, ans=0.1 2023-11-18 17:41:26,406 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.48 vs. limit=22.5 2023-11-18 17:41:31,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=337360.0, ans=0.0 2023-11-18 17:41:36,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=337360.0, ans=0.2 2023-11-18 17:41:37,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=337360.0, ans=0.2 2023-11-18 17:41:46,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=337426.6666666667, ans=0.95 2023-11-18 17:41:48,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-18 17:42:05,673 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 9.278e+01 1.050e+02 1.171e+02 1.497e+02, threshold=2.099e+02, percent-clipped=0.0 2023-11-18 17:42:13,639 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2550, loss[loss=0.08235, simple_loss=0.08523, pruned_loss=0.02457, audio_tagging_loss=0.01517, over 15673.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1173, pruned_loss=0.03269, audio_tagging_loss=0.01171, over 3051699.62 frames. ], batch size: 58, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:42:19,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=337626.6666666667, ans=0.125 2023-11-18 17:42:29,232 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2023-11-18 17:42:32,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=337693.3333333333, ans=0.0 2023-11-18 17:42:33,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=337693.3333333333, ans=0.125 2023-11-18 17:42:36,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-18 17:42:39,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=337760.0, ans=0.125 2023-11-18 17:42:42,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=337760.0, ans=0.0 2023-11-18 17:42:50,511 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-18 17:43:04,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=337893.3333333333, ans=0.125 2023-11-18 17:43:10,217 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2600, loss[loss=0.1116, simple_loss=0.1295, pruned_loss=0.03658, audio_tagging_loss=0.01023, over 16523.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1168, pruned_loss=0.03228, audio_tagging_loss=0.01161, over 3049541.44 frames. ], batch size: 62, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:43:18,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2023-11-18 17:43:34,084 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:43:42,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=338160.0, ans=0.125 2023-11-18 17:43:49,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=338160.0, ans=0.2 2023-11-18 17:43:57,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.896e+01 9.646e+01 1.065e+02 1.578e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-18 17:44:05,248 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2650, loss[loss=0.08831, simple_loss=0.08603, pruned_loss=0.02877, audio_tagging_loss=0.01652, over 13984.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1174, pruned_loss=0.0326, audio_tagging_loss=0.0115, over 3041803.81 frames. ], batch size: 55, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:44:08,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=338293.3333333333, ans=0.0 2023-11-18 17:44:08,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=338293.3333333333, ans=0.05 2023-11-18 17:44:09,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=338293.3333333333, ans=0.0 2023-11-18 17:44:11,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=338293.3333333333, ans=0.125 2023-11-18 17:44:19,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=338360.0, ans=0.125 2023-11-18 17:44:20,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=338360.0, ans=0.125 2023-11-18 17:44:28,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=338426.6666666667, ans=0.125 2023-11-18 17:45:01,127 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2700, loss[loss=0.1014, simple_loss=0.1156, pruned_loss=0.03128, audio_tagging_loss=0.01228, over 15390.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1166, pruned_loss=0.03248, audio_tagging_loss=0.01146, over 3045962.19 frames. ], batch size: 58, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:45:20,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=338693.3333333333, ans=0.025 2023-11-18 17:45:22,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338760.0, ans=0.125 2023-11-18 17:45:22,821 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.13 vs. limit=15.0 2023-11-18 17:45:39,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=338826.6666666667, ans=0.0 2023-11-18 17:45:46,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2023-11-18 17:45:49,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.914e+01 9.942e+01 1.124e+02 1.692e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 17:45:57,621 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2750, loss[loss=0.1152, simple_loss=0.1359, pruned_loss=0.04068, audio_tagging_loss=0.006591, over 15097.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1176, pruned_loss=0.03272, audio_tagging_loss=0.01139, over 3037468.07 frames. ], batch size: 55, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:45:58,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=338960.0, ans=0.125 2023-11-18 17:45:59,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=338960.0, ans=0.0 2023-11-18 17:46:04,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=338960.0, ans=0.0 2023-11-18 17:46:05,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338960.0, ans=0.1 2023-11-18 17:46:07,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=12.0 2023-11-18 17:46:21,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=339093.3333333333, ans=0.2 2023-11-18 17:46:25,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=339093.3333333333, ans=0.0 2023-11-18 17:46:27,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2023-11-18 17:46:29,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=339160.0, ans=0.125 2023-11-18 17:46:45,086 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:46:46,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=339226.6666666667, ans=0.125 2023-11-18 17:46:52,447 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2800, loss[loss=0.1216, simple_loss=0.1487, pruned_loss=0.03968, audio_tagging_loss=0.007552, over 15743.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1182, pruned_loss=0.03291, audio_tagging_loss=0.01139, over 3038214.20 frames. ], batch size: 57, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:46:52,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=339293.3333333333, ans=0.125 2023-11-18 17:47:07,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-18 17:47:10,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=339360.0, ans=0.125 2023-11-18 17:47:17,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2023-11-18 17:47:35,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=339560.0, ans=0.125 2023-11-18 17:47:39,926 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.210e+01 1.044e+02 1.186e+02 2.162e+02, threshold=2.088e+02, percent-clipped=1.0 2023-11-18 17:47:47,895 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2850, loss[loss=0.1334, simple_loss=0.1466, pruned_loss=0.05002, audio_tagging_loss=0.01008, over 14925.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1184, pruned_loss=0.03281, audio_tagging_loss=0.01132, over 3035266.26 frames. ], batch size: 57, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:47:57,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=339626.6666666667, ans=0.0 2023-11-18 17:48:07,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=339693.3333333333, ans=0.125 2023-11-18 17:48:16,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=339760.0, ans=0.1 2023-11-18 17:48:19,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=339760.0, ans=0.2 2023-11-18 17:48:29,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=339826.6666666667, ans=0.125 2023-11-18 17:48:41,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=339893.3333333333, ans=0.0 2023-11-18 17:48:44,341 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2900, loss[loss=0.1258, simple_loss=0.131, pruned_loss=0.05053, audio_tagging_loss=0.009778, over 13975.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1172, pruned_loss=0.03241, audio_tagging_loss=0.01133, over 3030093.96 frames. ], batch size: 55, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:49:05,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340093.3333333333, ans=0.1 2023-11-18 17:49:07,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340093.3333333333, ans=0.1 2023-11-18 17:49:14,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=340093.3333333333, ans=0.0 2023-11-18 17:49:20,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=340160.0, ans=0.0 2023-11-18 17:49:33,079 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 9.214e+01 1.048e+02 1.170e+02 1.772e+02, threshold=2.096e+02, percent-clipped=0.0 2023-11-18 17:49:40,489 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 2950, loss[loss=0.0966, simple_loss=0.1073, pruned_loss=0.03109, audio_tagging_loss=0.01185, over 17132.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1182, pruned_loss=0.03276, audio_tagging_loss=0.01132, over 3041823.57 frames. ], batch size: 65, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:49:54,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.57 vs. limit=10.0 2023-11-18 17:49:55,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=340360.0, ans=0.1 2023-11-18 17:50:12,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=340426.6666666667, ans=0.0 2023-11-18 17:50:15,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=340493.3333333333, ans=0.0 2023-11-18 17:50:36,777 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3000, loss[loss=0.1015, simple_loss=0.1211, pruned_loss=0.03241, audio_tagging_loss=0.008525, over 16332.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1186, pruned_loss=0.03303, audio_tagging_loss=0.01132, over 3038079.36 frames. ], batch size: 59, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:50:36,778 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 17:50:53,839 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5826, 2.6515, 4.0278, 3.1117], device='cuda:3') 2023-11-18 17:51:06,513 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2871, 3.9439, 4.3073, 4.3454], device='cuda:3') 2023-11-18 17:51:09,279 INFO [train_asr.py:1147] (3/4) Epoch 5, validation: loss=0.07345, simple_loss=0.06093, pruned_loss=0.009446, audio_tagging_loss=0.03354, over 4681554.00 frames. 2023-11-18 17:51:09,279 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 17:51:09,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=340626.6666666667, ans=0.05 2023-11-18 17:51:25,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340693.3333333333, ans=0.1 2023-11-18 17:51:36,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340760.0, ans=0.1 2023-11-18 17:51:36,824 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2023-11-18 17:51:48,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.89 vs. limit=10.0 2023-11-18 17:51:56,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 9.087e+01 9.878e+01 1.115e+02 1.743e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-18 17:51:59,753 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-18 17:52:04,469 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3050, loss[loss=0.09829, simple_loss=0.1131, pruned_loss=0.03016, audio_tagging_loss=0.01159, over 16817.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1184, pruned_loss=0.03311, audio_tagging_loss=0.0115, over 3035362.12 frames. ], batch size: 64, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:52:11,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=340960.0, ans=0.0 2023-11-18 17:52:11,361 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-11-18 17:52:14,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=341026.6666666667, ans=0.125 2023-11-18 17:52:18,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=341026.6666666667, ans=0.0 2023-11-18 17:52:29,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-11-18 17:52:36,786 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:52:43,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-11-18 17:52:51,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=341226.6666666667, ans=0.0 2023-11-18 17:52:59,728 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3100, loss[loss=0.09048, simple_loss=0.09464, pruned_loss=0.02988, audio_tagging_loss=0.01328, over 16166.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1178, pruned_loss=0.03281, audio_tagging_loss=0.01161, over 3037034.60 frames. ], batch size: 60, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:53:08,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=341293.3333333333, ans=0.125 2023-11-18 17:53:12,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.13 vs. limit=10.0 2023-11-18 17:53:21,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=341426.6666666667, ans=10.0 2023-11-18 17:53:26,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=341426.6666666667, ans=0.125 2023-11-18 17:53:35,718 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-11-18 17:53:43,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=341560.0, ans=0.035 2023-11-18 17:53:43,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2023-11-18 17:53:46,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=341560.0, ans=0.0 2023-11-18 17:53:47,379 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 9.303e+01 9.886e+01 1.114e+02 1.331e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-18 17:53:48,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=341560.0, ans=0.125 2023-11-18 17:53:50,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341560.0, ans=0.1 2023-11-18 17:53:51,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=341560.0, ans=0.125 2023-11-18 17:53:55,383 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3150, loss[loss=0.1224, simple_loss=0.1424, pruned_loss=0.04146, audio_tagging_loss=0.009693, over 16026.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1178, pruned_loss=0.03258, audio_tagging_loss=0.0117, over 3045202.71 frames. ], batch size: 62, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:54:06,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.65 vs. limit=15.0 2023-11-18 17:54:43,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-11-18 17:54:48,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=341893.3333333333, ans=0.125 2023-11-18 17:54:51,847 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3200, loss[loss=0.1079, simple_loss=0.1219, pruned_loss=0.03333, audio_tagging_loss=0.01361, over 15131.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1183, pruned_loss=0.03295, audio_tagging_loss=0.01177, over 3042627.64 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:54:59,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=341960.0, ans=0.125 2023-11-18 17:55:13,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=342093.3333333333, ans=0.1 2023-11-18 17:55:15,116 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-11-18 17:55:36,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=342226.6666666667, ans=0.02 2023-11-18 17:55:39,558 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 9.174e+01 9.896e+01 1.084e+02 1.894e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-18 17:55:47,525 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3250, loss[loss=0.09171, simple_loss=0.09821, pruned_loss=0.02726, audio_tagging_loss=0.01535, over 15113.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1164, pruned_loss=0.03237, audio_tagging_loss=0.01206, over 3040712.08 frames. ], batch size: 59, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:55:48,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=342293.3333333333, ans=0.0 2023-11-18 17:55:56,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=342293.3333333333, ans=0.125 2023-11-18 17:56:04,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.07 vs. limit=15.0 2023-11-18 17:56:07,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342360.0, ans=0.1 2023-11-18 17:56:21,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=342493.3333333333, ans=0.125 2023-11-18 17:56:42,534 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3300, loss[loss=0.1438, simple_loss=0.1585, pruned_loss=0.05548, audio_tagging_loss=0.009106, over 16089.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1181, pruned_loss=0.03327, audio_tagging_loss=0.01206, over 3038978.36 frames. ], batch size: 58, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:56:50,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=342626.6666666667, ans=0.025 2023-11-18 17:56:55,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=342693.3333333333, ans=0.07 2023-11-18 17:57:00,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=342693.3333333333, ans=0.125 2023-11-18 17:57:27,181 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=15.0 2023-11-18 17:57:31,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=342893.3333333333, ans=0.125 2023-11-18 17:57:31,971 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 9.162e+01 1.022e+02 1.144e+02 1.543e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-18 17:57:39,985 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3350, loss[loss=0.09251, simple_loss=0.09808, pruned_loss=0.03093, audio_tagging_loss=0.01254, over 15387.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1179, pruned_loss=0.03319, audio_tagging_loss=0.01185, over 3041082.71 frames. ], batch size: 59, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:57:43,470 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.330e-01 2023-11-18 17:57:54,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=343026.6666666667, ans=0.125 2023-11-18 17:58:07,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=343093.3333333333, ans=0.0 2023-11-18 17:58:24,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=343226.6666666667, ans=0.125 2023-11-18 17:58:30,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=343226.6666666667, ans=0.07 2023-11-18 17:58:35,866 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3400, loss[loss=0.06031, simple_loss=0.05137, pruned_loss=0.0194, audio_tagging_loss=0.01523, over 14520.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1183, pruned_loss=0.03321, audio_tagging_loss=0.0116, over 3037258.13 frames. ], batch size: 57, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:58:36,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=343293.3333333333, ans=0.0 2023-11-18 17:58:46,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=343360.0, ans=0.0 2023-11-18 17:58:57,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=343426.6666666667, ans=0.05 2023-11-18 17:59:16,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=343493.3333333333, ans=0.125 2023-11-18 17:59:17,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=343493.3333333333, ans=0.125 2023-11-18 17:59:23,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.786e+01 1.073e+02 1.222e+02 1.705e+02, threshold=2.147e+02, percent-clipped=0.0 2023-11-18 17:59:27,263 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:59:27,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.86 vs. limit=15.0 2023-11-18 17:59:31,126 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3450, loss[loss=0.07195, simple_loss=0.08184, pruned_loss=0.01946, audio_tagging_loss=0.01158, over 14404.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.117, pruned_loss=0.03278, audio_tagging_loss=0.01157, over 3032037.16 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:59:35,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343626.6666666667, ans=0.1 2023-11-18 17:59:38,304 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.38 vs. limit=12.0 2023-11-18 18:00:02,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=343760.0, ans=0.0 2023-11-18 18:00:03,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.09 vs. limit=12.0 2023-11-18 18:00:04,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=343826.6666666667, ans=0.125 2023-11-18 18:00:06,716 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-18 18:00:20,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=343893.3333333333, ans=0.0 2023-11-18 18:00:27,487 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3500, loss[loss=0.1613, simple_loss=0.1872, pruned_loss=0.05722, audio_tagging_loss=0.01047, over 16048.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1187, pruned_loss=0.0332, audio_tagging_loss=0.01147, over 3042935.69 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:00:45,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344026.6666666667, ans=0.1 2023-11-18 18:00:55,475 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:00:57,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=344093.3333333333, ans=0.0 2023-11-18 18:01:05,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=344160.0, ans=0.125 2023-11-18 18:01:16,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 9.216e+01 1.044e+02 1.195e+02 1.654e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 18:01:19,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=344226.6666666667, ans=0.2 2023-11-18 18:01:23,672 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3550, loss[loss=0.06694, simple_loss=0.07744, pruned_loss=0.01577, audio_tagging_loss=0.01246, over 14945.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1177, pruned_loss=0.0329, audio_tagging_loss=0.01136, over 3037782.04 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:01:27,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=344293.3333333333, ans=0.07 2023-11-18 18:01:28,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=344293.3333333333, ans=0.125 2023-11-18 18:01:32,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=344293.3333333333, ans=0.0 2023-11-18 18:01:35,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.43 vs. limit=10.0 2023-11-18 18:01:46,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=344426.6666666667, ans=0.125 2023-11-18 18:01:54,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=12.0 2023-11-18 18:02:19,432 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3600, loss[loss=0.08941, simple_loss=0.1059, pruned_loss=0.02691, audio_tagging_loss=0.009547, over 15360.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1176, pruned_loss=0.0329, audio_tagging_loss=0.01131, over 3040889.89 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:02:25,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=344626.6666666667, ans=0.2 2023-11-18 18:02:47,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.98 vs. limit=15.0 2023-11-18 18:02:47,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=344760.0, ans=0.125 2023-11-18 18:03:08,094 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.942e+01 9.105e+01 1.020e+02 1.125e+02 1.503e+02, threshold=2.039e+02, percent-clipped=0.0 2023-11-18 18:03:16,171 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3650, loss[loss=0.1059, simple_loss=0.1246, pruned_loss=0.0356, audio_tagging_loss=0.007975, over 15558.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1185, pruned_loss=0.03325, audio_tagging_loss=0.01131, over 3042295.57 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:03:35,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=345026.6666666667, ans=0.0 2023-11-18 18:03:38,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=345093.3333333333, ans=0.1 2023-11-18 18:04:00,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=345226.6666666667, ans=0.125 2023-11-18 18:04:01,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2023-11-18 18:04:05,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=345226.6666666667, ans=0.125 2023-11-18 18:04:11,794 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3700, loss[loss=0.07858, simple_loss=0.0822, pruned_loss=0.02166, audio_tagging_loss=0.01582, over 15944.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1179, pruned_loss=0.03301, audio_tagging_loss=0.0113, over 3047821.29 frames. ], batch size: 60, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:04:22,647 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.35 vs. limit=22.5 2023-11-18 18:05:00,335 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 9.436e+01 1.012e+02 1.107e+02 1.712e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 18:05:07,825 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3750, loss[loss=0.1008, simple_loss=0.1067, pruned_loss=0.03422, audio_tagging_loss=0.01324, over 14901.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1177, pruned_loss=0.03296, audio_tagging_loss=0.01137, over 3053492.04 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:05:14,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=345626.6666666667, ans=0.0 2023-11-18 18:05:21,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=345693.3333333333, ans=0.1 2023-11-18 18:05:46,201 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:05:56,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=345893.3333333333, ans=0.125 2023-11-18 18:06:04,348 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3800, loss[loss=0.1206, simple_loss=0.1446, pruned_loss=0.03922, audio_tagging_loss=0.009143, over 15283.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1171, pruned_loss=0.03265, audio_tagging_loss=0.0114, over 3052667.09 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:06:14,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-18 18:06:43,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=346160.0, ans=0.125 2023-11-18 18:06:50,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=346226.6666666667, ans=0.125 2023-11-18 18:06:52,214 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 9.330e+01 1.017e+02 1.159e+02 1.442e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 18:06:59,705 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3850, loss[loss=0.06565, simple_loss=0.06511, pruned_loss=0.01674, audio_tagging_loss=0.01635, over 15596.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.118, pruned_loss=0.03286, audio_tagging_loss=0.01142, over 3058046.23 frames. ], batch size: 61, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:07:13,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=346360.0, ans=0.0 2023-11-18 18:07:15,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=346360.0, ans=6.0 2023-11-18 18:07:25,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=346426.6666666667, ans=0.125 2023-11-18 18:07:42,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-18 18:07:45,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=346560.0, ans=0.0 2023-11-18 18:07:55,526 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3900, loss[loss=0.06532, simple_loss=0.06943, pruned_loss=0.01746, audio_tagging_loss=0.01315, over 14401.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1166, pruned_loss=0.03224, audio_tagging_loss=0.01156, over 3051099.08 frames. ], batch size: 55, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:08:06,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.78 vs. limit=8.0 2023-11-18 18:08:33,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=346826.6666666667, ans=0.07 2023-11-18 18:08:45,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 9.163e+01 1.014e+02 1.129e+02 1.556e+02, threshold=2.028e+02, percent-clipped=0.0 2023-11-18 18:08:53,953 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 3950, loss[loss=0.1099, simple_loss=0.1232, pruned_loss=0.03549, audio_tagging_loss=0.01276, over 14960.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1172, pruned_loss=0.03265, audio_tagging_loss=0.01164, over 3048973.66 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:08:54,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=346960.0, ans=0.07 2023-11-18 18:09:02,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=346960.0, ans=0.125 2023-11-18 18:09:24,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=347093.3333333333, ans=0.025 2023-11-18 18:09:49,357 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4000, loss[loss=0.09802, simple_loss=0.1111, pruned_loss=0.02846, audio_tagging_loss=0.01401, over 15045.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1184, pruned_loss=0.03318, audio_tagging_loss=0.01163, over 3051899.54 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:09:51,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=347293.3333333333, ans=0.2 2023-11-18 18:09:53,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=347293.3333333333, ans=0.125 2023-11-18 18:09:57,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=347293.3333333333, ans=0.125 2023-11-18 18:10:06,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=347360.0, ans=0.125 2023-11-18 18:10:07,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-18 18:10:34,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=347560.0, ans=0.0 2023-11-18 18:10:34,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=347560.0, ans=0.125 2023-11-18 18:10:37,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.209e+01 1.022e+02 1.112e+02 1.476e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-18 18:10:38,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=347560.0, ans=0.0 2023-11-18 18:10:40,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=347560.0, ans=0.125 2023-11-18 18:10:46,041 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4050, loss[loss=0.1247, simple_loss=0.148, pruned_loss=0.0362, audio_tagging_loss=0.01453, over 15215.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1173, pruned_loss=0.03252, audio_tagging_loss=0.01174, over 3050020.02 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:10:46,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2023-11-18 18:10:47,153 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:10:57,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=347693.3333333333, ans=0.0 2023-11-18 18:10:59,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=347693.3333333333, ans=0.2 2023-11-18 18:11:42,704 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4100, loss[loss=0.08699, simple_loss=0.1012, pruned_loss=0.02597, audio_tagging_loss=0.01045, over 13579.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1179, pruned_loss=0.03266, audio_tagging_loss=0.01165, over 3056816.02 frames. ], batch size: 54, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:11:49,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=347960.0, ans=0.07 2023-11-18 18:11:50,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=347960.0, ans=0.2 2023-11-18 18:12:02,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-11-18 18:12:32,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 9.111e+01 1.020e+02 1.147e+02 2.406e+02, threshold=2.040e+02, percent-clipped=1.0 2023-11-18 18:12:33,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348226.6666666667, ans=0.125 2023-11-18 18:12:37,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=348293.3333333333, ans=0.125 2023-11-18 18:12:38,566 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4150, loss[loss=0.1265, simple_loss=0.1497, pruned_loss=0.04361, audio_tagging_loss=0.008045, over 15574.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1183, pruned_loss=0.03281, audio_tagging_loss=0.0115, over 3052564.16 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:13:02,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=348426.6666666667, ans=0.125 2023-11-18 18:13:07,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2023-11-18 18:13:11,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=348493.3333333333, ans=0.2 2023-11-18 18:13:16,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=348493.3333333333, ans=0.0 2023-11-18 18:13:17,894 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:13:19,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=348493.3333333333, ans=0.0 2023-11-18 18:13:27,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=348560.0, ans=0.07 2023-11-18 18:13:34,506 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4200, loss[loss=0.09149, simple_loss=0.09631, pruned_loss=0.03327, audio_tagging_loss=0.01006, over 14178.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1191, pruned_loss=0.03316, audio_tagging_loss=0.0113, over 3048315.02 frames. ], batch size: 55, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:13:35,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=348626.6666666667, ans=0.125 2023-11-18 18:13:39,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=348626.6666666667, ans=0.125 2023-11-18 18:13:44,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=348693.3333333333, ans=0.125 2023-11-18 18:13:50,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=348693.3333333333, ans=0.125 2023-11-18 18:13:57,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=348760.0, ans=0.125 2023-11-18 18:14:02,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=348760.0, ans=15.0 2023-11-18 18:14:08,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348826.6666666667, ans=0.125 2023-11-18 18:14:16,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=348826.6666666667, ans=0.0 2023-11-18 18:14:22,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=348893.3333333333, ans=0.125 2023-11-18 18:14:23,314 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 9.025e+01 9.781e+01 1.065e+02 1.508e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 18:14:30,709 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4250, loss[loss=0.09299, simple_loss=0.1042, pruned_loss=0.02929, audio_tagging_loss=0.0116, over 15818.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1202, pruned_loss=0.0334, audio_tagging_loss=0.01118, over 3049336.35 frames. ], batch size: 58, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:14:39,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348960.0, ans=0.125 2023-11-18 18:14:42,217 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:14:49,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.07 vs. limit=12.0 2023-11-18 18:14:53,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=15.0 2023-11-18 18:14:58,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=349093.3333333333, ans=0.2 2023-11-18 18:15:07,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=349160.0, ans=0.04949747468305833 2023-11-18 18:15:18,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=349226.6666666667, ans=0.0 2023-11-18 18:15:25,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=349293.3333333333, ans=0.125 2023-11-18 18:15:26,397 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4300, loss[loss=0.09675, simple_loss=0.1046, pruned_loss=0.03391, audio_tagging_loss=0.01051, over 15957.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1203, pruned_loss=0.03342, audio_tagging_loss=0.01099, over 3051652.88 frames. ], batch size: 63, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:15:37,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=349360.0, ans=0.125 2023-11-18 18:15:41,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=349360.0, ans=0.0 2023-11-18 18:15:57,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=349426.6666666667, ans=0.09899494936611666 2023-11-18 18:16:10,078 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:16:13,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=349560.0, ans=0.05 2023-11-18 18:16:15,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.907e+01 1.023e+02 1.143e+02 1.661e+02, threshold=2.046e+02, percent-clipped=0.0 2023-11-18 18:16:16,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=349560.0, ans=0.05 2023-11-18 18:16:21,908 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4350, loss[loss=0.06869, simple_loss=0.0813, pruned_loss=0.01418, audio_tagging_loss=0.01386, over 14776.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1206, pruned_loss=0.03338, audio_tagging_loss=0.011, over 3051528.74 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:16:24,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2023-11-18 18:16:36,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=349693.3333333333, ans=0.035 2023-11-18 18:16:43,369 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-18 18:16:48,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=349760.0, ans=0.125 2023-11-18 18:16:50,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=349760.0, ans=0.0 2023-11-18 18:17:02,536 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-11-18 18:17:07,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.54 vs. limit=15.0 2023-11-18 18:17:12,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=349893.3333333333, ans=0.0 2023-11-18 18:17:17,992 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4400, loss[loss=0.112, simple_loss=0.1195, pruned_loss=0.04067, audio_tagging_loss=0.01157, over 15129.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1195, pruned_loss=0.033, audio_tagging_loss=0.0111, over 3046056.89 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:17:25,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=349960.0, ans=0.035 2023-11-18 18:17:27,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=349960.0, ans=0.0 2023-11-18 18:17:34,042 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2023-11-18 18:17:39,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=350093.3333333333, ans=0.125 2023-11-18 18:17:46,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=350093.3333333333, ans=0.05 2023-11-18 18:18:00,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=350160.0, ans=0.0 2023-11-18 18:18:07,464 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.072e+01 1.020e+02 1.136e+02 1.526e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 18:18:13,851 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4450, loss[loss=0.08461, simple_loss=0.09231, pruned_loss=0.02587, audio_tagging_loss=0.01259, over 16238.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1185, pruned_loss=0.03285, audio_tagging_loss=0.01123, over 3050785.11 frames. ], batch size: 61, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:18:32,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=350360.0, ans=0.125 2023-11-18 18:18:37,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=350426.6666666667, ans=0.125 2023-11-18 18:18:39,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=350426.6666666667, ans=0.1 2023-11-18 18:18:48,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=350493.3333333333, ans=0.1 2023-11-18 18:18:52,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=350493.3333333333, ans=0.0 2023-11-18 18:19:08,906 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4500, loss[loss=0.09716, simple_loss=0.1122, pruned_loss=0.02857, audio_tagging_loss=0.01249, over 15155.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1193, pruned_loss=0.03287, audio_tagging_loss=0.01117, over 3053953.71 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:19:13,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=350626.6666666667, ans=0.125 2023-11-18 18:19:23,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=350693.3333333333, ans=0.1 2023-11-18 18:19:30,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=350693.3333333333, ans=15.0 2023-11-18 18:19:39,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=350760.0, ans=0.0 2023-11-18 18:19:41,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=350760.0, ans=0.125 2023-11-18 18:19:42,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=350826.6666666667, ans=0.0 2023-11-18 18:19:43,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=12.0 2023-11-18 18:19:45,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=350826.6666666667, ans=0.0 2023-11-18 18:19:48,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=350826.6666666667, ans=0.0 2023-11-18 18:19:58,692 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 9.243e+01 1.009e+02 1.115e+02 1.767e+02, threshold=2.018e+02, percent-clipped=0.0 2023-11-18 18:20:05,091 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4550, loss[loss=0.105, simple_loss=0.1203, pruned_loss=0.03404, audio_tagging_loss=0.0108, over 15238.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1192, pruned_loss=0.03298, audio_tagging_loss=0.01112, over 3051631.59 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:20:05,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=350960.0, ans=0.125 2023-11-18 18:20:23,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-18 18:20:23,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=351026.6666666667, ans=0.1 2023-11-18 18:20:23,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=351026.6666666667, ans=0.125 2023-11-18 18:20:41,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=351160.0, ans=0.125 2023-11-18 18:20:46,551 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:21:02,024 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4600, loss[loss=0.1154, simple_loss=0.1376, pruned_loss=0.03278, audio_tagging_loss=0.01378, over 15933.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1185, pruned_loss=0.0329, audio_tagging_loss=0.01121, over 3050726.72 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:21:09,649 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:21:21,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=351360.0, ans=0.1 2023-11-18 18:21:35,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=351493.3333333333, ans=0.125 2023-11-18 18:21:38,197 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.31 vs. limit=15.0 2023-11-18 18:21:50,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 9.307e+01 1.009e+02 1.129e+02 1.665e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 18:21:57,356 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4650, loss[loss=0.1238, simple_loss=0.1405, pruned_loss=0.03901, audio_tagging_loss=0.01457, over 16261.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1168, pruned_loss=0.03249, audio_tagging_loss=0.01141, over 3050797.64 frames. ], batch size: 61, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:22:02,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=351626.6666666667, ans=0.125 2023-11-18 18:22:26,878 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-11-18 18:22:30,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=351826.6666666667, ans=0.0 2023-11-18 18:22:38,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=351826.6666666667, ans=0.2 2023-11-18 18:22:52,864 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4700, loss[loss=0.1086, simple_loss=0.1252, pruned_loss=0.03557, audio_tagging_loss=0.01046, over 15901.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1166, pruned_loss=0.03258, audio_tagging_loss=0.01159, over 3045862.91 frames. ], batch size: 61, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:23:11,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2023-11-18 18:23:23,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352093.3333333333, ans=0.1 2023-11-18 18:23:35,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=352160.0, ans=0.0 2023-11-18 18:23:42,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 9.156e+01 9.801e+01 1.107e+02 1.485e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-18 18:23:49,047 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4750, loss[loss=0.0999, simple_loss=0.1179, pruned_loss=0.02803, audio_tagging_loss=0.01293, over 15027.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1163, pruned_loss=0.03244, audio_tagging_loss=0.01162, over 3045186.30 frames. ], batch size: 55, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:23:53,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=352293.3333333333, ans=0.0 2023-11-18 18:23:59,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.96 vs. limit=15.0 2023-11-18 18:24:19,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=352426.6666666667, ans=0.125 2023-11-18 18:24:25,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=352493.3333333333, ans=0.0 2023-11-18 18:24:26,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.08 vs. limit=22.5 2023-11-18 18:24:27,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=352493.3333333333, ans=0.125 2023-11-18 18:24:30,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352493.3333333333, ans=0.1 2023-11-18 18:24:33,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=352560.0, ans=0.0 2023-11-18 18:24:45,201 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4800, loss[loss=0.09692, simple_loss=0.1128, pruned_loss=0.02805, audio_tagging_loss=0.01249, over 15016.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1174, pruned_loss=0.0327, audio_tagging_loss=0.01168, over 3054232.62 frames. ], batch size: 55, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:24:49,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=352626.6666666667, ans=0.125 2023-11-18 18:24:52,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=352626.6666666667, ans=10.0 2023-11-18 18:25:09,549 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352760.0, ans=0.1 2023-11-18 18:25:22,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.93 vs. limit=15.0 2023-11-18 18:25:34,877 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 9.461e+01 1.064e+02 1.235e+02 1.881e+02, threshold=2.128e+02, percent-clipped=0.0 2023-11-18 18:25:39,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=352893.3333333333, ans=0.125 2023-11-18 18:25:41,304 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4850, loss[loss=0.08295, simple_loss=0.08494, pruned_loss=0.02908, audio_tagging_loss=0.0114, over 15277.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1177, pruned_loss=0.03267, audio_tagging_loss=0.01175, over 3055234.59 frames. ], batch size: 62, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:25:54,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=353026.6666666667, ans=0.0 2023-11-18 18:26:08,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=353093.3333333333, ans=0.0 2023-11-18 18:26:37,597 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4900, loss[loss=0.08722, simple_loss=0.1051, pruned_loss=0.02507, audio_tagging_loss=0.009585, over 15198.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1183, pruned_loss=0.03287, audio_tagging_loss=0.01163, over 3054353.68 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:26:40,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=353293.3333333333, ans=0.125 2023-11-18 18:27:28,031 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.441e+01 1.051e+02 1.165e+02 1.612e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 18:27:33,415 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 4950, loss[loss=0.083, simple_loss=0.0908, pruned_loss=0.02413, audio_tagging_loss=0.01347, over 15620.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1174, pruned_loss=0.03247, audio_tagging_loss=0.01157, over 3051660.29 frames. ], batch size: 61, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:27:48,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=353693.3333333333, ans=0.0 2023-11-18 18:27:53,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=353693.3333333333, ans=0.125 2023-11-18 18:28:05,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=353760.0, ans=0.0 2023-11-18 18:28:14,385 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2023-11-18 18:28:29,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=353960.0, ans=0.125 2023-11-18 18:28:30,000 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5000, loss[loss=0.07138, simple_loss=0.07091, pruned_loss=0.01958, audio_tagging_loss=0.01634, over 13508.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1174, pruned_loss=0.03246, audio_tagging_loss=0.01145, over 3049114.35 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:28:34,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-18 18:28:50,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=354026.6666666667, ans=0.125 2023-11-18 18:29:03,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=354160.0, ans=0.2 2023-11-18 18:29:04,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=354160.0, ans=0.0 2023-11-18 18:29:13,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354226.6666666667, ans=0.1 2023-11-18 18:29:19,885 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 9.416e+01 1.033e+02 1.125e+02 1.808e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 18:29:20,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=354226.6666666667, ans=0.04949747468305833 2023-11-18 18:29:26,420 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5050, loss[loss=0.1025, simple_loss=0.119, pruned_loss=0.03352, audio_tagging_loss=0.009488, over 16069.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1178, pruned_loss=0.0327, audio_tagging_loss=0.01133, over 3049865.38 frames. ], batch size: 61, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:29:37,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.36 vs. limit=22.5 2023-11-18 18:29:43,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=354360.0, ans=0.125 2023-11-18 18:29:52,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=354426.6666666667, ans=0.125 2023-11-18 18:29:56,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=354426.6666666667, ans=0.125 2023-11-18 18:30:08,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.67 vs. limit=22.5 2023-11-18 18:30:12,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=354560.0, ans=0.0 2023-11-18 18:30:14,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2023-11-18 18:30:21,620 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5100, loss[loss=0.1391, simple_loss=0.1614, pruned_loss=0.05145, audio_tagging_loss=0.00694, over 15678.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.118, pruned_loss=0.03291, audio_tagging_loss=0.01127, over 3052243.21 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:30:44,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=354760.0, ans=0.0 2023-11-18 18:30:45,518 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:30:53,883 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.787e-01 2023-11-18 18:31:11,571 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 9.700e+01 1.061e+02 1.154e+02 1.523e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 18:31:13,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=354893.3333333333, ans=0.1 2023-11-18 18:31:17,924 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5150, loss[loss=0.1058, simple_loss=0.1236, pruned_loss=0.0346, audio_tagging_loss=0.009451, over 16129.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1184, pruned_loss=0.033, audio_tagging_loss=0.01123, over 3049928.00 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:31:34,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2023-11-18 18:31:46,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=355093.3333333333, ans=0.1 2023-11-18 18:32:13,660 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5200, loss[loss=0.08496, simple_loss=0.09996, pruned_loss=0.02615, audio_tagging_loss=0.008838, over 13759.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1183, pruned_loss=0.03292, audio_tagging_loss=0.01122, over 3036971.12 frames. ], batch size: 54, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:32:28,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=355360.0, ans=0.125 2023-11-18 18:32:33,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=355360.0, ans=0.125 2023-11-18 18:32:41,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=355426.6666666667, ans=0.2 2023-11-18 18:32:52,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2023-11-18 18:33:03,978 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.196e+01 1.027e+02 1.112e+02 1.442e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 18:33:09,311 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5250, loss[loss=0.09946, simple_loss=0.1103, pruned_loss=0.02997, audio_tagging_loss=0.01436, over 16915.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1189, pruned_loss=0.03299, audio_tagging_loss=0.01113, over 3038268.51 frames. ], batch size: 64, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:33:10,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=355626.6666666667, ans=0.1 2023-11-18 18:33:16,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=355626.6666666667, ans=0.0 2023-11-18 18:33:22,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=355693.3333333333, ans=0.1 2023-11-18 18:33:24,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=355693.3333333333, ans=0.1 2023-11-18 18:33:34,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=355760.0, ans=0.0 2023-11-18 18:33:44,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=15.0 2023-11-18 18:33:48,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=355826.6666666667, ans=0.125 2023-11-18 18:33:50,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=355826.6666666667, ans=0.125 2023-11-18 18:33:52,256 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=15.0 2023-11-18 18:33:53,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.68 vs. limit=15.0 2023-11-18 18:33:57,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=355893.3333333333, ans=0.125 2023-11-18 18:34:04,633 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5300, loss[loss=0.09439, simple_loss=0.1195, pruned_loss=0.02908, audio_tagging_loss=0.00557, over 14958.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1172, pruned_loss=0.0326, audio_tagging_loss=0.01121, over 3038018.89 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:34:29,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356093.3333333333, ans=0.1 2023-11-18 18:34:32,349 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-18 18:34:41,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=356160.0, ans=0.125 2023-11-18 18:34:51,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=356226.6666666667, ans=0.125 2023-11-18 18:34:55,613 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 8.052e+01 9.440e+01 1.053e+02 1.166e+02 1.714e+02, threshold=2.106e+02, percent-clipped=0.0 2023-11-18 18:35:00,876 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5350, loss[loss=0.06274, simple_loss=0.07388, pruned_loss=0.01448, audio_tagging_loss=0.01132, over 15020.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1173, pruned_loss=0.03261, audio_tagging_loss=0.01128, over 3036878.07 frames. ], batch size: 58, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:35:01,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=356293.3333333333, ans=0.125 2023-11-18 18:35:18,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=356360.0, ans=0.0 2023-11-18 18:35:28,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=356426.6666666667, ans=0.2 2023-11-18 18:35:33,005 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=356493.3333333333, ans=0.0 2023-11-18 18:35:33,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=356493.3333333333, ans=0.5 2023-11-18 18:35:55,917 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5400, loss[loss=0.1241, simple_loss=0.1379, pruned_loss=0.04371, audio_tagging_loss=0.01149, over 15276.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1166, pruned_loss=0.03239, audio_tagging_loss=0.01129, over 3032382.73 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:36:26,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=356760.0, ans=0.2 2023-11-18 18:36:37,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=356826.6666666667, ans=0.0 2023-11-18 18:36:37,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=356826.6666666667, ans=0.125 2023-11-18 18:36:46,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.854e+01 9.931e+01 1.101e+02 1.556e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-18 18:36:51,434 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5450, loss[loss=0.1015, simple_loss=0.1012, pruned_loss=0.03514, audio_tagging_loss=0.01576, over 16372.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1167, pruned_loss=0.03252, audio_tagging_loss=0.01141, over 3034804.57 frames. ], batch size: 64, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:36:52,192 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2023-11-18 18:36:53,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356960.0, ans=0.1 2023-11-18 18:36:53,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=356960.0, ans=0.0 2023-11-18 18:37:06,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=357026.6666666667, ans=0.125 2023-11-18 18:37:16,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=357093.3333333333, ans=0.125 2023-11-18 18:37:25,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=357160.0, ans=0.125 2023-11-18 18:37:25,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=12.0 2023-11-18 18:37:27,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=357160.0, ans=0.125 2023-11-18 18:37:46,501 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5500, loss[loss=0.09826, simple_loss=0.1173, pruned_loss=0.02906, audio_tagging_loss=0.01052, over 16163.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1172, pruned_loss=0.03269, audio_tagging_loss=0.01148, over 3040217.43 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:37:56,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=357293.3333333333, ans=0.125 2023-11-18 18:37:57,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=357360.0, ans=0.2 2023-11-18 18:38:22,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=357493.3333333333, ans=0.125 2023-11-18 18:38:27,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=357493.3333333333, ans=0.0 2023-11-18 18:38:31,842 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.38 vs. limit=6.0 2023-11-18 18:38:36,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=357560.0, ans=0.125 2023-11-18 18:38:38,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 9.009e+01 9.946e+01 1.101e+02 1.690e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-18 18:38:38,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=357560.0, ans=0.1 2023-11-18 18:38:38,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=357560.0, ans=0.125 2023-11-18 18:38:42,431 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5550, loss[loss=0.1309, simple_loss=0.146, pruned_loss=0.04517, audio_tagging_loss=0.01274, over 15467.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1178, pruned_loss=0.03322, audio_tagging_loss=0.01155, over 3047047.21 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:39:12,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2023-11-18 18:39:18,680 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2023-11-18 18:39:22,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2023-11-18 18:39:25,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=357826.6666666667, ans=0.2 2023-11-18 18:39:26,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2023-11-18 18:39:29,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-18 18:39:31,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=357893.3333333333, ans=0.125 2023-11-18 18:39:31,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=357893.3333333333, ans=0.1 2023-11-18 18:39:37,598 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5600, loss[loss=0.1092, simple_loss=0.1245, pruned_loss=0.03497, audio_tagging_loss=0.01198, over 15570.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1167, pruned_loss=0.03266, audio_tagging_loss=0.01166, over 3050274.62 frames. ], batch size: 60, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:39:52,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=358026.6666666667, ans=0.125 2023-11-18 18:39:53,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358026.6666666667, ans=0.1 2023-11-18 18:39:59,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=358093.3333333333, ans=0.125 2023-11-18 18:40:08,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=358093.3333333333, ans=0.0 2023-11-18 18:40:10,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=358160.0, ans=0.2 2023-11-18 18:40:11,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=12.0 2023-11-18 18:40:15,562 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:40:20,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=358160.0, ans=0.04949747468305833 2023-11-18 18:40:24,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=358226.6666666667, ans=0.0 2023-11-18 18:40:29,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.084e+01 1.022e+02 1.205e+02 1.640e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 18:40:32,838 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5650, loss[loss=0.105, simple_loss=0.1252, pruned_loss=0.02891, audio_tagging_loss=0.01346, over 13864.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1169, pruned_loss=0.03259, audio_tagging_loss=0.01162, over 3048175.40 frames. ], batch size: 53, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:40:33,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=358293.3333333333, ans=0.0 2023-11-18 18:40:44,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=358360.0, ans=0.125 2023-11-18 18:40:54,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=358426.6666666667, ans=0.0 2023-11-18 18:41:03,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=358426.6666666667, ans=0.2 2023-11-18 18:41:22,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=358560.0, ans=0.1 2023-11-18 18:41:29,428 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5700, loss[loss=0.1173, simple_loss=0.1383, pruned_loss=0.04243, audio_tagging_loss=0.005679, over 15202.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.117, pruned_loss=0.03228, audio_tagging_loss=0.01155, over 3047495.65 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:41:34,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2023-11-18 18:42:21,631 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.993e+01 9.865e+01 1.099e+02 1.758e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 18:42:24,818 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5750, loss[loss=0.09712, simple_loss=0.1176, pruned_loss=0.02634, audio_tagging_loss=0.01198, over 15165.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.117, pruned_loss=0.03217, audio_tagging_loss=0.01149, over 3054356.66 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:42:34,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=358960.0, ans=0.0 2023-11-18 18:42:36,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359026.6666666667, ans=0.1 2023-11-18 18:42:56,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=359093.3333333333, ans=0.125 2023-11-18 18:43:13,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=359226.6666666667, ans=0.0 2023-11-18 18:43:15,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=359226.6666666667, ans=0.125 2023-11-18 18:43:18,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=359226.6666666667, ans=0.125 2023-11-18 18:43:20,444 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5800, loss[loss=0.1131, simple_loss=0.1263, pruned_loss=0.03681, audio_tagging_loss=0.01308, over 16390.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1163, pruned_loss=0.0321, audio_tagging_loss=0.01145, over 3047490.80 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:43:20,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=359293.3333333333, ans=0.125 2023-11-18 18:43:29,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=359293.3333333333, ans=0.125 2023-11-18 18:44:12,847 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.910e+01 9.863e+01 1.080e+02 1.378e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 18:44:14,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2023-11-18 18:44:16,604 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5850, loss[loss=0.08354, simple_loss=0.08778, pruned_loss=0.02432, audio_tagging_loss=0.01533, over 14925.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1166, pruned_loss=0.03221, audio_tagging_loss=0.01136, over 3045564.46 frames. ], batch size: 58, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:44:24,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=359626.6666666667, ans=0.0 2023-11-18 18:44:51,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=359826.6666666667, ans=0.125 2023-11-18 18:45:01,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359893.3333333333, ans=0.1 2023-11-18 18:45:02,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=359893.3333333333, ans=0.125 2023-11-18 18:45:12,057 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5900, loss[loss=0.09011, simple_loss=0.09818, pruned_loss=0.0271, audio_tagging_loss=0.01392, over 15057.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1179, pruned_loss=0.03263, audio_tagging_loss=0.01136, over 3046887.04 frames. ], batch size: 57, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:45:12,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=359960.0, ans=0.07 2023-11-18 18:45:36,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=360093.3333333333, ans=0.125 2023-11-18 18:45:45,960 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2023-11-18 18:45:48,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2023-11-18 18:46:04,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 9.214e+01 1.011e+02 1.146e+02 1.411e+02, threshold=2.022e+02, percent-clipped=0.0 2023-11-18 18:46:07,759 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 5950, loss[loss=0.09969, simple_loss=0.1044, pruned_loss=0.03152, audio_tagging_loss=0.01597, over 14540.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1177, pruned_loss=0.03246, audio_tagging_loss=0.01135, over 3045766.93 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:46:12,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=360293.3333333333, ans=0.0 2023-11-18 18:46:19,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=360360.0, ans=0.125 2023-11-18 18:46:19,130 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:46:43,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.14 vs. limit=22.5 2023-11-18 18:46:44,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=360493.3333333333, ans=0.035 2023-11-18 18:47:03,806 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6000, loss[loss=0.1034, simple_loss=0.1278, pruned_loss=0.03006, audio_tagging_loss=0.00943, over 14937.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1175, pruned_loss=0.03237, audio_tagging_loss=0.01132, over 3048989.22 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:47:03,807 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 18:47:36,965 INFO [train_asr.py:1147] (3/4) Epoch 5, validation: loss=0.0732, simple_loss=0.06039, pruned_loss=0.009139, audio_tagging_loss=0.03386, over 4681554.00 frames. 2023-11-18 18:47:36,966 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 18:47:39,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=360626.6666666667, ans=0.125 2023-11-18 18:47:40,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=360626.6666666667, ans=0.5 2023-11-18 18:47:45,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=360626.6666666667, ans=0.125 2023-11-18 18:47:48,791 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:48:13,921 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:48:14,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=360826.6666666667, ans=0.125 2023-11-18 18:48:28,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 9.155e+01 9.916e+01 1.075e+02 1.410e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-18 18:48:31,980 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6050, loss[loss=0.09437, simple_loss=0.1052, pruned_loss=0.02936, audio_tagging_loss=0.01239, over 14058.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1168, pruned_loss=0.03218, audio_tagging_loss=0.01142, over 3044250.31 frames. ], batch size: 53, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:49:05,162 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:49:12,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=361160.0, ans=0.0 2023-11-18 18:49:17,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361226.6666666667, ans=0.1 2023-11-18 18:49:28,168 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6100, loss[loss=0.1072, simple_loss=0.1239, pruned_loss=0.03344, audio_tagging_loss=0.01183, over 15815.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1182, pruned_loss=0.0325, audio_tagging_loss=0.01129, over 3051026.75 frames. ], batch size: 58, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:49:31,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=361293.3333333333, ans=0.0 2023-11-18 18:49:38,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=361360.0, ans=0.2 2023-11-18 18:49:50,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=361426.6666666667, ans=0.125 2023-11-18 18:50:17,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=361560.0, ans=0.0 2023-11-18 18:50:21,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.202e+01 1.052e+02 1.142e+02 1.737e+02, threshold=2.103e+02, percent-clipped=0.0 2023-11-18 18:50:23,665 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6150, loss[loss=0.1341, simple_loss=0.155, pruned_loss=0.04712, audio_tagging_loss=0.009486, over 14935.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1195, pruned_loss=0.03297, audio_tagging_loss=0.01122, over 3050829.96 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:50:35,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.25 vs. limit=10.0 2023-11-18 18:50:40,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=361693.3333333333, ans=0.125 2023-11-18 18:50:51,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=361760.0, ans=0.125 2023-11-18 18:50:54,849 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361760.0, ans=0.1 2023-11-18 18:51:18,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=361893.3333333333, ans=0.125 2023-11-18 18:51:20,188 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6200, loss[loss=0.1267, simple_loss=0.1425, pruned_loss=0.04356, audio_tagging_loss=0.01193, over 14828.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1193, pruned_loss=0.03303, audio_tagging_loss=0.01138, over 3054183.09 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:51:21,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=361960.0, ans=0.2 2023-11-18 18:51:39,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=362026.6666666667, ans=0.125 2023-11-18 18:51:39,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=362026.6666666667, ans=0.125 2023-11-18 18:51:42,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.62 vs. limit=10.0 2023-11-18 18:51:44,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=362093.3333333333, ans=0.2 2023-11-18 18:51:59,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=362160.0, ans=0.0 2023-11-18 18:52:03,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2023-11-18 18:52:10,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-11-18 18:52:10,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=362226.6666666667, ans=0.125 2023-11-18 18:52:14,243 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.346e+01 1.036e+02 1.107e+02 1.533e+02, threshold=2.072e+02, percent-clipped=0.0 2023-11-18 18:52:16,403 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6250, loss[loss=0.1203, simple_loss=0.1364, pruned_loss=0.04418, audio_tagging_loss=0.007867, over 16356.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1178, pruned_loss=0.03252, audio_tagging_loss=0.01157, over 3054555.93 frames. ], batch size: 63, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:52:18,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=362293.3333333333, ans=0.125 2023-11-18 18:52:30,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2023-11-18 18:53:05,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=362560.0, ans=0.125 2023-11-18 18:53:07,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=362560.0, ans=0.2 2023-11-18 18:53:09,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=362560.0, ans=0.125 2023-11-18 18:53:10,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=362626.6666666667, ans=0.125 2023-11-18 18:53:11,487 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6300, loss[loss=0.05471, simple_loss=0.05061, pruned_loss=0.01107, audio_tagging_loss=0.01833, over 15211.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1167, pruned_loss=0.0321, audio_tagging_loss=0.01177, over 3044518.13 frames. ], batch size: 61, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:53:15,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=362626.6666666667, ans=0.1 2023-11-18 18:53:17,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-18 18:53:19,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=362626.6666666667, ans=0.04949747468305833 2023-11-18 18:53:44,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=362826.6666666667, ans=0.125 2023-11-18 18:53:45,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=362826.6666666667, ans=0.0 2023-11-18 18:53:52,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=12.0 2023-11-18 18:53:55,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=362893.3333333333, ans=0.125 2023-11-18 18:54:04,969 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 9.079e+01 9.861e+01 1.090e+02 1.541e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 18:54:05,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=362893.3333333333, ans=0.125 2023-11-18 18:54:06,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.80 vs. limit=15.0 2023-11-18 18:54:07,085 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6350, loss[loss=0.1043, simple_loss=0.1232, pruned_loss=0.03504, audio_tagging_loss=0.007713, over 15526.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1161, pruned_loss=0.03206, audio_tagging_loss=0.01182, over 3040516.80 frames. ], batch size: 57, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:54:33,376 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.043e-02 2023-11-18 18:54:34,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=363093.3333333333, ans=0.125 2023-11-18 18:54:43,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=363160.0, ans=0.0 2023-11-18 18:54:45,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=363160.0, ans=0.125 2023-11-18 18:54:58,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=363226.6666666667, ans=0.125 2023-11-18 18:55:03,933 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6400, loss[loss=0.09743, simple_loss=0.102, pruned_loss=0.03168, audio_tagging_loss=0.01477, over 15439.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1158, pruned_loss=0.03207, audio_tagging_loss=0.01197, over 3043398.06 frames. ], batch size: 60, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:55:06,200 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:55:18,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=363360.0, ans=0.09899494936611666 2023-11-18 18:55:32,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=363426.6666666667, ans=0.125 2023-11-18 18:55:36,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=363493.3333333333, ans=0.125 2023-11-18 18:55:56,645 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 9.321e+01 1.035e+02 1.143e+02 1.548e+02, threshold=2.069e+02, percent-clipped=0.0 2023-11-18 18:55:58,780 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6450, loss[loss=0.1057, simple_loss=0.1128, pruned_loss=0.0359, audio_tagging_loss=0.01342, over 14084.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1166, pruned_loss=0.03228, audio_tagging_loss=0.01189, over 3035046.41 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:56:01,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=363626.6666666667, ans=0.125 2023-11-18 18:56:14,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=363693.3333333333, ans=0.0 2023-11-18 18:56:15,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=363693.3333333333, ans=0.125 2023-11-18 18:56:35,253 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=10.0 2023-11-18 18:56:38,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=363826.6666666667, ans=0.125 2023-11-18 18:56:41,367 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:56:42,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2023-11-18 18:56:46,029 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.82 vs. limit=15.0 2023-11-18 18:56:54,225 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6500, loss[loss=0.09409, simple_loss=0.1166, pruned_loss=0.02718, audio_tagging_loss=0.008621, over 15645.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1167, pruned_loss=0.03234, audio_tagging_loss=0.01184, over 3032938.80 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:56:58,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-11-18 18:57:10,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=364026.6666666667, ans=0.07 2023-11-18 18:57:26,516 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.54 vs. limit=15.0 2023-11-18 18:57:45,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=364226.6666666667, ans=0.0 2023-11-18 18:57:45,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=364226.6666666667, ans=0.0 2023-11-18 18:57:48,371 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.392e+01 1.004e+02 1.100e+02 1.543e+02, threshold=2.007e+02, percent-clipped=0.0 2023-11-18 18:57:50,516 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6550, loss[loss=0.1002, simple_loss=0.1073, pruned_loss=0.03361, audio_tagging_loss=0.01296, over 15594.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1154, pruned_loss=0.03189, audio_tagging_loss=0.01168, over 3034671.22 frames. ], batch size: 59, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:58:06,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=364360.0, ans=0.125 2023-11-18 18:58:24,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=364493.3333333333, ans=0.1 2023-11-18 18:58:34,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=364493.3333333333, ans=22.5 2023-11-18 18:58:46,368 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6600, loss[loss=0.06316, simple_loss=0.05757, pruned_loss=0.01478, audio_tagging_loss=0.01959, over 14413.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1156, pruned_loss=0.03211, audio_tagging_loss=0.01157, over 3034802.97 frames. ], batch size: 57, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:58:48,104 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2023-11-18 18:58:56,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=364693.3333333333, ans=0.0 2023-11-18 18:58:57,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364693.3333333333, ans=0.1 2023-11-18 18:59:06,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=364693.3333333333, ans=0.0 2023-11-18 18:59:39,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.947e+01 1.006e+02 1.140e+02 1.601e+02, threshold=2.013e+02, percent-clipped=0.0 2023-11-18 18:59:41,803 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6650, loss[loss=0.0949, simple_loss=0.1048, pruned_loss=0.02936, audio_tagging_loss=0.01314, over 16316.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1144, pruned_loss=0.0316, audio_tagging_loss=0.01148, over 3033576.97 frames. ], batch size: 63, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:59:44,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2023-11-18 18:59:49,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=364960.0, ans=0.0 2023-11-18 18:59:52,151 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=22.5 2023-11-18 19:00:00,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=365026.6666666667, ans=0.07 2023-11-18 19:00:27,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=365226.6666666667, ans=0.125 2023-11-18 19:00:37,640 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6700, loss[loss=0.1171, simple_loss=0.1442, pruned_loss=0.0361, audio_tagging_loss=0.008868, over 16359.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1153, pruned_loss=0.03187, audio_tagging_loss=0.01137, over 3030364.68 frames. ], batch size: 60, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:00:38,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=365293.3333333333, ans=0.025 2023-11-18 19:00:54,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-18 19:01:25,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=365560.0, ans=0.125 2023-11-18 19:01:32,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 9.414e+01 1.042e+02 1.183e+02 1.878e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 19:01:33,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=365626.6666666667, ans=0.125 2023-11-18 19:01:34,262 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6750, loss[loss=0.06941, simple_loss=0.06781, pruned_loss=0.0165, audio_tagging_loss=0.01901, over 17345.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1161, pruned_loss=0.03196, audio_tagging_loss=0.01141, over 3032305.69 frames. ], batch size: 70, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:02:03,501 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.99 vs. limit=6.0 2023-11-18 19:02:05,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=365760.0, ans=0.2 2023-11-18 19:02:07,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=365826.6666666667, ans=0.125 2023-11-18 19:02:18,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=365893.3333333333, ans=0.125 2023-11-18 19:02:21,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=365893.3333333333, ans=0.125 2023-11-18 19:02:24,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365893.3333333333, ans=0.1 2023-11-18 19:02:29,898 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6800, loss[loss=0.07867, simple_loss=0.0874, pruned_loss=0.02406, audio_tagging_loss=0.01092, over 15380.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.1152, pruned_loss=0.03174, audio_tagging_loss=0.01141, over 3033858.37 frames. ], batch size: 60, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:02:49,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=366026.6666666667, ans=0.1 2023-11-18 19:03:05,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.46 vs. limit=10.0 2023-11-18 19:03:22,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=366226.6666666667, ans=0.2 2023-11-18 19:03:22,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.984e+01 9.907e+01 1.134e+02 1.555e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-18 19:03:24,935 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6850, loss[loss=0.07698, simple_loss=0.08829, pruned_loss=0.01547, audio_tagging_loss=0.01736, over 14035.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1145, pruned_loss=0.03147, audio_tagging_loss=0.01136, over 3035211.34 frames. ], batch size: 54, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:03:48,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2023-11-18 19:03:50,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-18 19:03:57,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=366493.3333333333, ans=0.125 2023-11-18 19:04:21,393 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6900, loss[loss=0.08841, simple_loss=0.09339, pruned_loss=0.03003, audio_tagging_loss=0.01169, over 16954.00 frames. ], tot_loss[loss=0.09965, simple_loss=0.1141, pruned_loss=0.03125, audio_tagging_loss=0.01136, over 3034028.27 frames. ], batch size: 64, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:04:28,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=366626.6666666667, ans=0.0 2023-11-18 19:04:39,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=366693.3333333333, ans=0.025 2023-11-18 19:04:46,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=366760.0, ans=0.2 2023-11-18 19:05:02,948 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:05:15,601 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.152e+01 1.012e+02 1.130e+02 1.420e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 19:05:17,773 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 6950, loss[loss=0.1097, simple_loss=0.1329, pruned_loss=0.03422, audio_tagging_loss=0.009006, over 17203.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1147, pruned_loss=0.0314, audio_tagging_loss=0.0114, over 3047734.90 frames. ], batch size: 62, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:05:21,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=366960.0, ans=0.125 2023-11-18 19:05:38,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=367026.6666666667, ans=10.0 2023-11-18 19:06:12,701 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7000, loss[loss=0.09772, simple_loss=0.1021, pruned_loss=0.03511, audio_tagging_loss=0.01157, over 14403.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.115, pruned_loss=0.03184, audio_tagging_loss=0.01147, over 3044070.82 frames. ], batch size: 57, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:06:25,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367360.0, ans=0.1 2023-11-18 19:06:34,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.05 vs. limit=22.5 2023-11-18 19:06:36,502 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.48 vs. limit=10.0 2023-11-18 19:06:44,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=367426.6666666667, ans=0.125 2023-11-18 19:07:06,341 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-11-18 19:07:06,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 9.236e+01 1.008e+02 1.142e+02 1.683e+02, threshold=2.016e+02, percent-clipped=0.0 2023-11-18 19:07:08,811 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7050, loss[loss=0.1066, simple_loss=0.1239, pruned_loss=0.0318, audio_tagging_loss=0.01288, over 15587.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1158, pruned_loss=0.03204, audio_tagging_loss=0.01158, over 3047035.22 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:07:12,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=367626.6666666667, ans=0.2 2023-11-18 19:07:23,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=367693.3333333333, ans=0.125 2023-11-18 19:07:30,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=22.5 2023-11-18 19:07:30,664 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.50 vs. limit=22.5 2023-11-18 19:07:34,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2023-11-18 19:07:40,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=367826.6666666667, ans=0.125 2023-11-18 19:07:47,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=367826.6666666667, ans=0.125 2023-11-18 19:08:04,465 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7100, loss[loss=0.1039, simple_loss=0.1192, pruned_loss=0.03278, audio_tagging_loss=0.01152, over 15164.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1158, pruned_loss=0.03195, audio_tagging_loss=0.01163, over 3046604.71 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:08:05,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367960.0, ans=0.1 2023-11-18 19:08:12,517 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:08:21,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-18 19:08:32,890 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2023-11-18 19:08:36,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2023-11-18 19:08:39,937 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:08:57,713 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.190e+01 1.032e+02 1.164e+02 1.806e+02, threshold=2.063e+02, percent-clipped=0.0 2023-11-18 19:08:59,843 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7150, loss[loss=0.08891, simple_loss=0.1034, pruned_loss=0.02453, audio_tagging_loss=0.01268, over 15580.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1167, pruned_loss=0.03203, audio_tagging_loss=0.01161, over 3045317.17 frames. ], batch size: 59, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:09:00,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=368293.3333333333, ans=0.125 2023-11-18 19:09:19,653 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2023-11-18 19:09:29,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=368426.6666666667, ans=0.125 2023-11-18 19:09:33,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=368493.3333333333, ans=0.125 2023-11-18 19:09:34,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=368493.3333333333, ans=0.0 2023-11-18 19:09:45,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=368560.0, ans=0.0 2023-11-18 19:09:55,903 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7200, loss[loss=0.1415, simple_loss=0.1626, pruned_loss=0.05184, audio_tagging_loss=0.008367, over 15793.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1171, pruned_loss=0.03213, audio_tagging_loss=0.0116, over 3048529.61 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:10:00,459 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-18 19:10:13,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=368693.3333333333, ans=0.125 2023-11-18 19:10:28,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=368826.6666666667, ans=0.1 2023-11-18 19:10:35,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=368826.6666666667, ans=0.1 2023-11-18 19:10:36,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-18 19:10:46,083 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.311e-02 2023-11-18 19:10:49,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.156e+01 1.033e+02 1.136e+02 1.885e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 19:10:51,152 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7250, loss[loss=0.1265, simple_loss=0.1466, pruned_loss=0.04125, audio_tagging_loss=0.01199, over 16183.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1166, pruned_loss=0.03196, audio_tagging_loss=0.01166, over 3043461.27 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:10:56,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368960.0, ans=0.1 2023-11-18 19:11:03,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=369026.6666666667, ans=0.125 2023-11-18 19:11:05,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=369026.6666666667, ans=0.125 2023-11-18 19:11:07,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=12.0 2023-11-18 19:11:22,654 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.10 vs. limit=15.0 2023-11-18 19:11:25,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=369160.0, ans=0.125 2023-11-18 19:11:26,668 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2023-11-18 19:11:29,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-11-18 19:11:47,513 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7300, loss[loss=0.102, simple_loss=0.1221, pruned_loss=0.03105, audio_tagging_loss=0.009874, over 15045.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1169, pruned_loss=0.03182, audio_tagging_loss=0.01149, over 3040164.05 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:12:04,033 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:12:07,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=369360.0, ans=10.0 2023-11-18 19:12:20,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369493.3333333333, ans=0.1 2023-11-18 19:12:20,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=369493.3333333333, ans=0.125 2023-11-18 19:12:28,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=369493.3333333333, ans=0.07 2023-11-18 19:12:31,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.52 vs. limit=10.0 2023-11-18 19:12:41,596 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.861e+01 9.294e+01 1.042e+02 1.203e+02 1.669e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 19:12:44,283 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7350, loss[loss=0.07104, simple_loss=0.08257, pruned_loss=0.0188, audio_tagging_loss=0.01095, over 14664.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1174, pruned_loss=0.03209, audio_tagging_loss=0.01132, over 3038686.14 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:12:46,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=369626.6666666667, ans=0.0 2023-11-18 19:12:55,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-11-18 19:13:04,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=369693.3333333333, ans=0.125 2023-11-18 19:13:23,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=15.0 2023-11-18 19:13:29,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369893.3333333333, ans=0.125 2023-11-18 19:13:39,073 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7400, loss[loss=0.1072, simple_loss=0.1189, pruned_loss=0.03675, audio_tagging_loss=0.01105, over 15660.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1174, pruned_loss=0.03215, audio_tagging_loss=0.01124, over 3036303.85 frames. ], batch size: 60, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:13:42,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=369960.0, ans=0.125 2023-11-18 19:13:53,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=370026.6666666667, ans=0.0 2023-11-18 19:14:12,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-18 19:14:17,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.82 vs. limit=15.0 2023-11-18 19:14:17,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2023-11-18 19:14:22,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370226.6666666667, ans=0.1 2023-11-18 19:14:32,448 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 9.039e+01 9.551e+01 1.074e+02 1.292e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-18 19:14:34,624 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7450, loss[loss=0.1021, simple_loss=0.1194, pruned_loss=0.03408, audio_tagging_loss=0.008258, over 15002.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1165, pruned_loss=0.03183, audio_tagging_loss=0.01117, over 3043504.04 frames. ], batch size: 57, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:14:42,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=370293.3333333333, ans=0.125 2023-11-18 19:14:44,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=370360.0, ans=0.125 2023-11-18 19:14:56,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=370426.6666666667, ans=0.2 2023-11-18 19:14:58,802 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:14:59,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2023-11-18 19:15:03,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=370426.6666666667, ans=0.125 2023-11-18 19:15:10,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.73 vs. limit=15.0 2023-11-18 19:15:19,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=370560.0, ans=0.1 2023-11-18 19:15:22,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=370560.0, ans=0.5 2023-11-18 19:15:30,679 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7500, loss[loss=0.08415, simple_loss=0.09091, pruned_loss=0.02832, audio_tagging_loss=0.01038, over 14351.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1177, pruned_loss=0.03233, audio_tagging_loss=0.01114, over 3046635.87 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:15:35,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2023-11-18 19:15:44,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=370693.3333333333, ans=0.125 2023-11-18 19:15:51,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2023-11-18 19:16:20,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-18 19:16:24,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 9.004e+01 9.847e+01 1.087e+02 1.456e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-18 19:16:26,731 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7550, loss[loss=0.1205, simple_loss=0.1375, pruned_loss=0.04251, audio_tagging_loss=0.009241, over 15068.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.118, pruned_loss=0.03249, audio_tagging_loss=0.0111, over 3048149.53 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:16:31,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=370960.0, ans=0.2 2023-11-18 19:17:07,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=371160.0, ans=0.2 2023-11-18 19:17:07,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=371160.0, ans=0.1 2023-11-18 19:17:19,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=15.0 2023-11-18 19:17:22,448 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7600, loss[loss=0.09155, simple_loss=0.1037, pruned_loss=0.02773, audio_tagging_loss=0.01198, over 15757.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1164, pruned_loss=0.03195, audio_tagging_loss=0.01122, over 3049721.79 frames. ], batch size: 60, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:17:22,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371293.3333333333, ans=0.1 2023-11-18 19:17:33,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=22.5 2023-11-18 19:17:58,733 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:18:15,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.066e+01 9.750e+01 1.073e+02 2.127e+02, threshold=1.950e+02, percent-clipped=2.0 2023-11-18 19:18:18,598 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7650, loss[loss=0.06783, simple_loss=0.07065, pruned_loss=0.01667, audio_tagging_loss=0.01584, over 14461.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1157, pruned_loss=0.03162, audio_tagging_loss=0.01126, over 3042781.63 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:18:30,079 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2023-11-18 19:18:33,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=371693.3333333333, ans=0.2 2023-11-18 19:18:59,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2023-11-18 19:19:14,363 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7700, loss[loss=0.1308, simple_loss=0.1474, pruned_loss=0.04742, audio_tagging_loss=0.009682, over 15560.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.116, pruned_loss=0.03153, audio_tagging_loss=0.01124, over 3049747.02 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:19:39,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=372093.3333333333, ans=0.09899494936611666 2023-11-18 19:19:40,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=372093.3333333333, ans=0.125 2023-11-18 19:19:44,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=372093.3333333333, ans=0.125 2023-11-18 19:19:55,029 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=372160.0, ans=0.125 2023-11-18 19:20:05,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=372226.6666666667, ans=0.125 2023-11-18 19:20:08,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.781e+01 9.756e+01 1.085e+02 1.598e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-18 19:20:10,363 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7750, loss[loss=0.09393, simple_loss=0.09773, pruned_loss=0.03129, audio_tagging_loss=0.01377, over 14528.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1152, pruned_loss=0.0313, audio_tagging_loss=0.01149, over 3047909.28 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:20:13,257 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=15.0 2023-11-18 19:20:23,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372360.0, ans=0.1 2023-11-18 19:20:26,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=22.5 2023-11-18 19:20:58,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=372560.0, ans=0.0 2023-11-18 19:21:05,717 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7800, loss[loss=0.07889, simple_loss=0.09163, pruned_loss=0.02117, audio_tagging_loss=0.0119, over 14869.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1161, pruned_loss=0.03164, audio_tagging_loss=0.0115, over 3039690.81 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:21:29,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=372760.0, ans=0.1 2023-11-18 19:21:32,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=372760.0, ans=0.0 2023-11-18 19:21:35,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=372760.0, ans=0.07 2023-11-18 19:21:50,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=372893.3333333333, ans=0.2 2023-11-18 19:21:57,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=372893.3333333333, ans=0.0 2023-11-18 19:22:00,500 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.836e+01 9.770e+01 1.067e+02 1.448e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 19:22:00,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=372893.3333333333, ans=0.1 2023-11-18 19:22:02,621 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7850, loss[loss=0.1258, simple_loss=0.1495, pruned_loss=0.04375, audio_tagging_loss=0.007309, over 16464.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1157, pruned_loss=0.03163, audio_tagging_loss=0.01153, over 3049576.22 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:22:11,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.66 vs. limit=15.0 2023-11-18 19:22:25,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=373093.3333333333, ans=0.0 2023-11-18 19:22:26,599 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=373093.3333333333, ans=0.0 2023-11-18 19:22:27,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=373093.3333333333, ans=0.0 2023-11-18 19:22:29,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=373093.3333333333, ans=0.0 2023-11-18 19:22:35,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=373160.0, ans=0.2 2023-11-18 19:22:58,338 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7900, loss[loss=0.102, simple_loss=0.1149, pruned_loss=0.03339, audio_tagging_loss=0.01121, over 15463.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1165, pruned_loss=0.03194, audio_tagging_loss=0.01157, over 3048602.30 frames. ], batch size: 59, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:23:26,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=373426.6666666667, ans=0.125 2023-11-18 19:23:32,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2023-11-18 19:23:36,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373493.3333333333, ans=0.1 2023-11-18 19:23:38,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=373493.3333333333, ans=0.04949747468305833 2023-11-18 19:23:47,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=373560.0, ans=0.2 2023-11-18 19:23:53,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 9.204e+01 9.997e+01 1.093e+02 1.252e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 19:23:55,403 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 7950, loss[loss=0.08785, simple_loss=0.09139, pruned_loss=0.03118, audio_tagging_loss=0.01098, over 14963.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1157, pruned_loss=0.03198, audio_tagging_loss=0.01169, over 3049991.43 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:23:59,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2023-11-18 19:24:08,083 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:24:11,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=373693.3333333333, ans=0.125 2023-11-18 19:24:19,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=373760.0, ans=0.0 2023-11-18 19:24:39,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=373893.3333333333, ans=0.0 2023-11-18 19:24:41,887 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=373893.3333333333, ans=0.0 2023-11-18 19:24:51,750 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8000, loss[loss=0.1215, simple_loss=0.1399, pruned_loss=0.03942, audio_tagging_loss=0.01211, over 15047.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1157, pruned_loss=0.03184, audio_tagging_loss=0.0117, over 3045373.70 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:24:55,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.04 vs. limit=22.5 2023-11-18 19:25:14,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=374093.3333333333, ans=0.1 2023-11-18 19:25:22,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-11-18 19:25:38,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=374226.6666666667, ans=0.125 2023-11-18 19:25:46,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.997e+01 9.797e+01 1.056e+02 1.371e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-18 19:25:47,819 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8050, loss[loss=0.09938, simple_loss=0.115, pruned_loss=0.02886, audio_tagging_loss=0.013, over 15416.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1156, pruned_loss=0.03203, audio_tagging_loss=0.01169, over 3041941.08 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:25:54,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=374293.3333333333, ans=0.0 2023-11-18 19:25:56,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374293.3333333333, ans=0.1 2023-11-18 19:26:01,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374360.0, ans=0.125 2023-11-18 19:26:04,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=374360.0, ans=0.0 2023-11-18 19:26:10,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=374426.6666666667, ans=0.125 2023-11-18 19:26:42,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.80 vs. limit=12.0 2023-11-18 19:26:42,903 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8100, loss[loss=0.1245, simple_loss=0.1444, pruned_loss=0.04327, audio_tagging_loss=0.009032, over 15904.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1169, pruned_loss=0.0323, audio_tagging_loss=0.01149, over 3047349.15 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:26:52,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=374626.6666666667, ans=0.2 2023-11-18 19:27:04,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=374693.3333333333, ans=0.125 2023-11-18 19:27:09,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=374760.0, ans=0.05 2023-11-18 19:27:38,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 9.480e+01 1.051e+02 1.132e+02 1.844e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 19:27:39,094 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8150, loss[loss=0.1181, simple_loss=0.1374, pruned_loss=0.04146, audio_tagging_loss=0.007894, over 15570.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.117, pruned_loss=0.03239, audio_tagging_loss=0.01124, over 3047043.99 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:27:49,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=375026.6666666667, ans=0.125 2023-11-18 19:28:13,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=375160.0, ans=0.0 2023-11-18 19:28:29,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=375226.6666666667, ans=0.0 2023-11-18 19:28:30,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=375226.6666666667, ans=0.125 2023-11-18 19:28:34,129 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:28:35,181 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8200, loss[loss=0.07344, simple_loss=0.09048, pruned_loss=0.01901, audio_tagging_loss=0.009184, over 15712.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1166, pruned_loss=0.03204, audio_tagging_loss=0.01121, over 3045798.43 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:28:41,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=375293.3333333333, ans=0.125 2023-11-18 19:28:50,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=375360.0, ans=0.1 2023-11-18 19:28:53,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375360.0, ans=0.1 2023-11-18 19:28:59,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=375426.6666666667, ans=0.07 2023-11-18 19:29:07,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=375493.3333333333, ans=0.2 2023-11-18 19:29:19,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=375560.0, ans=0.125 2023-11-18 19:29:20,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=375560.0, ans=0.1 2023-11-18 19:29:25,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=375560.0, ans=0.125 2023-11-18 19:29:28,969 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.797e+01 1.057e+02 1.238e+02 1.453e+02, threshold=2.115e+02, percent-clipped=0.0 2023-11-18 19:29:30,058 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8250, loss[loss=0.07125, simple_loss=0.08708, pruned_loss=0.01783, audio_tagging_loss=0.009883, over 14920.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1162, pruned_loss=0.03173, audio_tagging_loss=0.01114, over 3040838.48 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:29:38,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=375626.6666666667, ans=0.125 2023-11-18 19:29:52,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375760.0, ans=0.1 2023-11-18 19:30:06,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=375826.6666666667, ans=0.2 2023-11-18 19:30:25,318 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8300, loss[loss=0.1137, simple_loss=0.1279, pruned_loss=0.03831, audio_tagging_loss=0.01147, over 15720.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1169, pruned_loss=0.03196, audio_tagging_loss=0.01123, over 3043408.88 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:30:29,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=375960.0, ans=0.125 2023-11-18 19:30:50,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=376093.3333333333, ans=0.125 2023-11-18 19:30:52,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=376093.3333333333, ans=0.125 2023-11-18 19:31:11,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=376226.6666666667, ans=0.1 2023-11-18 19:31:19,571 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.771e+01 9.264e+01 1.007e+02 1.092e+02 1.530e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-18 19:31:21,239 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8350, loss[loss=0.1166, simple_loss=0.1383, pruned_loss=0.03704, audio_tagging_loss=0.01038, over 15434.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1179, pruned_loss=0.03209, audio_tagging_loss=0.01109, over 3042713.40 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:31:27,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=376293.3333333333, ans=0.04949747468305833 2023-11-18 19:31:35,882 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-18 19:31:35,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-18 19:31:40,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=376360.0, ans=0.0 2023-11-18 19:31:51,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.35 vs. limit=22.5 2023-11-18 19:32:07,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=376560.0, ans=0.125 2023-11-18 19:32:09,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=376560.0, ans=0.2 2023-11-18 19:32:16,812 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8400, loss[loss=0.1075, simple_loss=0.1284, pruned_loss=0.03421, audio_tagging_loss=0.009092, over 14761.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.1164, pruned_loss=0.03142, audio_tagging_loss=0.0112, over 3041295.87 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:32:26,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.47 vs. limit=15.0 2023-11-18 19:32:47,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=376760.0, ans=0.125 2023-11-18 19:32:53,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2023-11-18 19:32:53,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2023-11-18 19:32:55,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-11-18 19:33:11,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.828e+01 9.983e+01 1.109e+02 1.398e+02, threshold=1.997e+02, percent-clipped=0.0 2023-11-18 19:33:12,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=15.0 2023-11-18 19:33:13,012 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8450, loss[loss=0.08665, simple_loss=0.09969, pruned_loss=0.02288, audio_tagging_loss=0.01392, over 15590.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1175, pruned_loss=0.03188, audio_tagging_loss=0.01107, over 3043021.10 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:33:18,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=376960.0, ans=0.0 2023-11-18 19:33:28,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.36 vs. limit=22.5 2023-11-18 19:33:42,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.34 vs. limit=22.5 2023-11-18 19:33:54,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.90 vs. limit=22.5 2023-11-18 19:33:56,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2023-11-18 19:34:03,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=12.0 2023-11-18 19:34:07,978 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8500, loss[loss=0.1273, simple_loss=0.1556, pruned_loss=0.04197, audio_tagging_loss=0.007495, over 15530.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1176, pruned_loss=0.03197, audio_tagging_loss=0.01112, over 3049753.96 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:34:18,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=377360.0, ans=0.015 2023-11-18 19:34:23,464 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.80 vs. limit=15.0 2023-11-18 19:34:34,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.35 vs. limit=10.0 2023-11-18 19:34:38,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=15.0 2023-11-18 19:34:44,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=377493.3333333333, ans=0.125 2023-11-18 19:35:03,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.638e+01 9.723e+01 1.079e+02 1.527e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-18 19:35:04,375 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8550, loss[loss=0.1022, simple_loss=0.1194, pruned_loss=0.02714, audio_tagging_loss=0.01534, over 14958.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1172, pruned_loss=0.03191, audio_tagging_loss=0.01123, over 3059008.54 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:35:21,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=377693.3333333333, ans=0.2 2023-11-18 19:35:21,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=377693.3333333333, ans=0.125 2023-11-18 19:35:49,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377893.3333333333, ans=0.1 2023-11-18 19:35:51,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=377893.3333333333, ans=0.125 2023-11-18 19:35:59,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=377960.0, ans=0.025 2023-11-18 19:36:00,029 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8600, loss[loss=0.08517, simple_loss=0.1012, pruned_loss=0.02311, audio_tagging_loss=0.01145, over 15312.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1161, pruned_loss=0.03152, audio_tagging_loss=0.0113, over 3052585.89 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:36:20,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378026.6666666667, ans=0.1 2023-11-18 19:36:25,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2023-11-18 19:36:54,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.915e+01 9.697e+01 1.106e+02 1.523e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-18 19:36:55,646 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8650, loss[loss=0.09733, simple_loss=0.1118, pruned_loss=0.03121, audio_tagging_loss=0.01022, over 14986.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1175, pruned_loss=0.03213, audio_tagging_loss=0.01131, over 3055171.96 frames. ], batch size: 55, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:37:01,060 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-18 19:37:04,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=378293.3333333333, ans=0.0 2023-11-18 19:37:18,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=378426.6666666667, ans=0.125 2023-11-18 19:37:49,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=378560.0, ans=0.125 2023-11-18 19:37:51,207 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8700, loss[loss=0.1193, simple_loss=0.1378, pruned_loss=0.0391, audio_tagging_loss=0.01132, over 15912.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1166, pruned_loss=0.03184, audio_tagging_loss=0.01154, over 3052682.28 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:37:55,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=378626.6666666667, ans=0.125 2023-11-18 19:38:23,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=378760.0, ans=0.0 2023-11-18 19:38:46,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 9.292e+01 1.037e+02 1.144e+02 1.707e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 19:38:47,730 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8750, loss[loss=0.1165, simple_loss=0.1275, pruned_loss=0.0423, audio_tagging_loss=0.01049, over 14902.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1167, pruned_loss=0.03189, audio_tagging_loss=0.01152, over 3053023.76 frames. ], batch size: 55, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:38:49,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2023-11-18 19:39:20,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=379160.0, ans=0.0 2023-11-18 19:39:30,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=379160.0, ans=0.125 2023-11-18 19:39:43,195 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8800, loss[loss=0.1052, simple_loss=0.1224, pruned_loss=0.03244, audio_tagging_loss=0.01159, over 15226.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1173, pruned_loss=0.03194, audio_tagging_loss=0.01151, over 3050830.80 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:40:00,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=379360.0, ans=0.125 2023-11-18 19:40:04,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=379426.6666666667, ans=0.2 2023-11-18 19:40:18,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2023-11-18 19:40:33,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=379560.0, ans=0.2 2023-11-18 19:40:37,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 9.218e+01 1.050e+02 1.133e+02 1.971e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-18 19:40:38,553 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8850, loss[loss=0.1132, simple_loss=0.1365, pruned_loss=0.03419, audio_tagging_loss=0.01075, over 15549.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1174, pruned_loss=0.0319, audio_tagging_loss=0.01159, over 3054501.30 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:40:47,037 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:40:54,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.02 vs. limit=15.0 2023-11-18 19:41:10,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=379826.6666666667, ans=0.025 2023-11-18 19:41:10,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=379826.6666666667, ans=0.125 2023-11-18 19:41:11,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=379826.6666666667, ans=0.125 2023-11-18 19:41:19,465 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:41:20,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=379826.6666666667, ans=0.0 2023-11-18 19:41:33,622 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8900, loss[loss=0.1195, simple_loss=0.1344, pruned_loss=0.03906, audio_tagging_loss=0.01321, over 15614.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.117, pruned_loss=0.03162, audio_tagging_loss=0.01137, over 3053765.33 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:41:37,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=379960.0, ans=0.0 2023-11-18 19:41:42,571 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=379960.0, ans=0.2 2023-11-18 19:41:49,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2023-11-18 19:42:04,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380093.3333333333, ans=0.1 2023-11-18 19:42:07,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=380160.0, ans=0.125 2023-11-18 19:42:14,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=15.0 2023-11-18 19:42:28,694 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.192e+01 1.013e+02 1.118e+02 1.605e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 19:42:29,777 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 8950, loss[loss=0.1369, simple_loss=0.1727, pruned_loss=0.04416, audio_tagging_loss=0.006406, over 15908.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1163, pruned_loss=0.03137, audio_tagging_loss=0.01122, over 3057676.45 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:42:41,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=380360.0, ans=0.07 2023-11-18 19:42:45,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=380360.0, ans=0.5 2023-11-18 19:42:58,772 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2023-11-18 19:43:10,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=380493.3333333333, ans=0.125 2023-11-18 19:43:23,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=380560.0, ans=0.04949747468305833 2023-11-18 19:43:25,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.94 vs. limit=15.0 2023-11-18 19:43:25,558 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9000, loss[loss=0.1045, simple_loss=0.1117, pruned_loss=0.03962, audio_tagging_loss=0.009036, over 15115.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.117, pruned_loss=0.032, audio_tagging_loss=0.01113, over 3053764.39 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:43:25,559 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 19:43:58,396 INFO [train_asr.py:1147] (3/4) Epoch 5, validation: loss=0.07332, simple_loss=0.06001, pruned_loss=0.008857, audio_tagging_loss=0.03446, over 4681554.00 frames. 2023-11-18 19:43:58,397 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 19:44:47,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=380893.3333333333, ans=0.07 2023-11-18 19:44:54,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 9.240e+01 1.024e+02 1.108e+02 1.437e+02, threshold=2.047e+02, percent-clipped=0.0 2023-11-18 19:44:54,132 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9050, loss[loss=0.1039, simple_loss=0.12, pruned_loss=0.03288, audio_tagging_loss=0.01097, over 15945.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1162, pruned_loss=0.03188, audio_tagging_loss=0.01129, over 3052641.69 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:44:59,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=380960.0, ans=0.07 2023-11-18 19:45:07,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2023-11-18 19:45:18,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=381093.3333333333, ans=0.0 2023-11-18 19:45:29,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=381160.0, ans=0.125 2023-11-18 19:45:40,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381226.6666666667, ans=0.125 2023-11-18 19:45:43,713 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.14 vs. limit=15.0 2023-11-18 19:45:46,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.53 vs. limit=10.0 2023-11-18 19:45:49,533 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9100, loss[loss=0.1029, simple_loss=0.1172, pruned_loss=0.03489, audio_tagging_loss=0.009391, over 16635.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1156, pruned_loss=0.03164, audio_tagging_loss=0.01118, over 3054025.53 frames. ], batch size: 64, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:45:49,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=381293.3333333333, ans=0.125 2023-11-18 19:46:02,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=381360.0, ans=0.025 2023-11-18 19:46:03,330 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.060e-03 2023-11-18 19:46:14,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2023-11-18 19:46:31,419 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=381493.3333333333, ans=0.09899494936611666 2023-11-18 19:46:45,534 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.962e+01 1.000e+02 1.098e+02 1.318e+02, threshold=2.000e+02, percent-clipped=0.0 2023-11-18 19:46:45,561 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9150, loss[loss=0.09167, simple_loss=0.1013, pruned_loss=0.02966, audio_tagging_loss=0.01137, over 15076.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1165, pruned_loss=0.03181, audio_tagging_loss=0.01117, over 3050957.38 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:46:55,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2023-11-18 19:46:57,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.87 vs. limit=15.0 2023-11-18 19:47:33,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-11-18 19:47:36,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=12.0 2023-11-18 19:47:42,471 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9200, loss[loss=0.1276, simple_loss=0.1457, pruned_loss=0.04502, audio_tagging_loss=0.009664, over 14357.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1172, pruned_loss=0.03224, audio_tagging_loss=0.01121, over 3045387.41 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:47:42,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=381960.0, ans=0.125 2023-11-18 19:47:45,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381960.0, ans=0.125 2023-11-18 19:47:45,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=381960.0, ans=0.125 2023-11-18 19:47:47,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=381960.0, ans=0.2 2023-11-18 19:47:53,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-11-18 19:47:54,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=382026.6666666667, ans=0.0 2023-11-18 19:48:30,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=382226.6666666667, ans=0.125 2023-11-18 19:48:37,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 9.282e+01 1.040e+02 1.122e+02 1.499e+02, threshold=2.080e+02, percent-clipped=0.0 2023-11-18 19:48:37,918 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9250, loss[loss=0.06794, simple_loss=0.07646, pruned_loss=0.01857, audio_tagging_loss=0.01113, over 15215.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1161, pruned_loss=0.03177, audio_tagging_loss=0.01117, over 3045387.76 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:48:44,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=382293.3333333333, ans=0.1 2023-11-18 19:48:52,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=382360.0, ans=0.2 2023-11-18 19:49:01,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=12.0 2023-11-18 19:49:19,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2023-11-18 19:49:33,081 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9300, loss[loss=0.1112, simple_loss=0.1283, pruned_loss=0.03628, audio_tagging_loss=0.01074, over 14965.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1154, pruned_loss=0.03137, audio_tagging_loss=0.01132, over 3045232.76 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:49:48,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=382693.3333333333, ans=0.0 2023-11-18 19:49:59,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=382760.0, ans=0.125 2023-11-18 19:50:01,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=382760.0, ans=0.0 2023-11-18 19:50:09,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=382826.6666666667, ans=0.2 2023-11-18 19:50:10,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=382826.6666666667, ans=0.1 2023-11-18 19:50:11,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=382826.6666666667, ans=0.04949747468305833 2023-11-18 19:50:29,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 9.054e+01 9.801e+01 1.113e+02 1.567e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-18 19:50:29,759 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9350, loss[loss=0.09764, simple_loss=0.1149, pruned_loss=0.0303, audio_tagging_loss=0.00988, over 15441.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1161, pruned_loss=0.03169, audio_tagging_loss=0.01131, over 3042027.62 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:50:30,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=382960.0, ans=0.125 2023-11-18 19:50:32,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-18 19:50:47,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=383026.6666666667, ans=0.125 2023-11-18 19:51:12,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2023-11-18 19:51:20,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=383226.6666666667, ans=0.125 2023-11-18 19:51:22,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=383226.6666666667, ans=0.125 2023-11-18 19:51:25,404 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9400, loss[loss=0.08838, simple_loss=0.09518, pruned_loss=0.02896, audio_tagging_loss=0.01183, over 14656.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.1157, pruned_loss=0.03162, audio_tagging_loss=0.01131, over 3046200.42 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:51:44,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=383360.0, ans=0.125 2023-11-18 19:52:06,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=383493.3333333333, ans=0.125 2023-11-18 19:52:10,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.18 vs. limit=6.0 2023-11-18 19:52:17,518 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:52:17,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=383560.0, ans=0.0 2023-11-18 19:52:20,607 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.864e+01 9.867e+01 1.096e+02 1.502e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 19:52:20,634 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9450, loss[loss=0.1167, simple_loss=0.1367, pruned_loss=0.03927, audio_tagging_loss=0.009102, over 15335.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1154, pruned_loss=0.03154, audio_tagging_loss=0.01137, over 3050712.99 frames. ], batch size: 55, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:52:32,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=383693.3333333333, ans=0.125 2023-11-18 19:52:49,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=383760.0, ans=0.125 2023-11-18 19:53:05,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=383893.3333333333, ans=0.125 2023-11-18 19:53:16,796 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9500, loss[loss=0.1215, simple_loss=0.1388, pruned_loss=0.04365, audio_tagging_loss=0.008497, over 15169.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1153, pruned_loss=0.03154, audio_tagging_loss=0.01147, over 3049014.45 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:53:22,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=383960.0, ans=0.125 2023-11-18 19:53:48,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=384093.3333333333, ans=0.125 2023-11-18 19:53:55,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=384160.0, ans=0.1 2023-11-18 19:54:09,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=384226.6666666667, ans=0.0 2023-11-18 19:54:13,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 9.208e+01 1.015e+02 1.091e+02 1.477e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 19:54:13,498 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9550, loss[loss=0.1202, simple_loss=0.1344, pruned_loss=0.04256, audio_tagging_loss=0.01039, over 14719.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.116, pruned_loss=0.03167, audio_tagging_loss=0.0115, over 3045228.05 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:54:46,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=384493.3333333333, ans=0.125 2023-11-18 19:55:03,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=384560.0, ans=0.125 2023-11-18 19:55:08,354 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9600, loss[loss=0.1015, simple_loss=0.121, pruned_loss=0.03053, audio_tagging_loss=0.01048, over 15458.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1167, pruned_loss=0.03191, audio_tagging_loss=0.01156, over 3041879.40 frames. ], batch size: 59, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:55:22,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=384693.3333333333, ans=0.125 2023-11-18 19:55:28,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=384693.3333333333, ans=0.125 2023-11-18 19:55:35,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=384760.0, ans=0.125 2023-11-18 19:55:36,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.98 vs. limit=10.0 2023-11-18 19:55:39,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=384760.0, ans=0.035 2023-11-18 19:55:55,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=384893.3333333333, ans=0.1 2023-11-18 19:55:58,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=384893.3333333333, ans=0.125 2023-11-18 19:56:01,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=384893.3333333333, ans=0.125 2023-11-18 19:56:04,712 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9650, loss[loss=0.09579, simple_loss=0.1086, pruned_loss=0.02883, audio_tagging_loss=0.01266, over 15225.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1147, pruned_loss=0.03122, audio_tagging_loss=0.01163, over 3033712.36 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:56:05,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.450e+01 8.741e+01 9.505e+01 1.064e+02 1.391e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-18 19:56:25,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=385026.6666666667, ans=0.2 2023-11-18 19:56:44,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385160.0, ans=0.1 2023-11-18 19:56:54,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=385226.6666666667, ans=0.125 2023-11-18 19:57:00,514 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9700, loss[loss=0.04767, simple_loss=0.05393, pruned_loss=0.01142, audio_tagging_loss=0.009282, over 14790.00 frames. ], tot_loss[loss=0.09999, simple_loss=0.1146, pruned_loss=0.03123, audio_tagging_loss=0.01146, over 3043267.10 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:57:08,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-18 19:57:20,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=385360.0, ans=0.125 2023-11-18 19:57:31,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.84 vs. limit=10.0 2023-11-18 19:57:31,548 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2023-11-18 19:57:32,274 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.348e-01 2023-11-18 19:57:32,574 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.79 vs. limit=10.0 2023-11-18 19:57:34,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2023-11-18 19:57:36,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=385493.3333333333, ans=0.0 2023-11-18 19:57:45,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=385560.0, ans=0.0 2023-11-18 19:57:54,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=385560.0, ans=0.125 2023-11-18 19:57:56,526 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9750, loss[loss=0.0978, simple_loss=0.1048, pruned_loss=0.0331, audio_tagging_loss=0.01232, over 13741.00 frames. ], tot_loss[loss=0.09925, simple_loss=0.1142, pruned_loss=0.03085, audio_tagging_loss=0.0113, over 3038818.83 frames. ], batch size: 53, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:57:57,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.008e+01 9.015e+01 1.026e+02 1.125e+02 1.667e+02, threshold=2.051e+02, percent-clipped=0.0 2023-11-18 19:58:00,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385626.6666666667, ans=0.1 2023-11-18 19:58:25,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=385760.0, ans=0.0 2023-11-18 19:58:27,139 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:58:32,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385826.6666666667, ans=0.1 2023-11-18 19:58:43,144 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.99 vs. limit=22.5 2023-11-18 19:58:43,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=385893.3333333333, ans=0.125 2023-11-18 19:58:52,984 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9800, loss[loss=0.07703, simple_loss=0.09245, pruned_loss=0.01938, audio_tagging_loss=0.01142, over 14428.00 frames. ], tot_loss[loss=0.0995, simple_loss=0.1147, pruned_loss=0.03091, audio_tagging_loss=0.01125, over 3039940.31 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:58:59,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=385960.0, ans=0.0 2023-11-18 19:59:04,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=386026.6666666667, ans=0.0 2023-11-18 19:59:34,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=386160.0, ans=0.125 2023-11-18 19:59:40,927 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:59:48,947 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9850, loss[loss=0.109, simple_loss=0.1319, pruned_loss=0.0307, audio_tagging_loss=0.01234, over 15252.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1159, pruned_loss=0.0312, audio_tagging_loss=0.01119, over 3042806.02 frames. ], batch size: 54, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:59:50,000 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 9.044e+01 9.858e+01 1.082e+02 1.412e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 19:59:51,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=386293.3333333333, ans=0.05 2023-11-18 20:00:05,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2023-11-18 20:00:28,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=386493.3333333333, ans=0.2 2023-11-18 20:00:44,507 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9900, loss[loss=0.125, simple_loss=0.146, pruned_loss=0.0428, audio_tagging_loss=0.009142, over 15475.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1168, pruned_loss=0.0317, audio_tagging_loss=0.01106, over 3039744.50 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:00:52,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386626.6666666667, ans=0.1 2023-11-18 20:00:53,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386626.6666666667, ans=0.1 2023-11-18 20:01:00,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=386693.3333333333, ans=0.125 2023-11-18 20:01:12,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=386760.0, ans=0.125 2023-11-18 20:01:21,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=386826.6666666667, ans=0.125 2023-11-18 20:01:21,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2023-11-18 20:01:38,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2023-11-18 20:01:40,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386960.0, ans=0.1 2023-11-18 20:01:41,666 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 9950, loss[loss=0.06192, simple_loss=0.06957, pruned_loss=0.01616, audio_tagging_loss=0.01097, over 15212.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1171, pruned_loss=0.03159, audio_tagging_loss=0.01105, over 3043458.67 frames. ], batch size: 59, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:01:42,672 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.684e+01 9.823e+01 1.146e+02 1.516e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-18 20:01:58,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-18 20:02:00,697 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-18 20:02:02,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=387093.3333333333, ans=0.025 2023-11-18 20:02:07,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=387093.3333333333, ans=0.95 2023-11-18 20:02:21,499 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2023-11-18 20:02:26,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-18 20:02:27,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=387226.6666666667, ans=0.125 2023-11-18 20:02:29,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=387226.6666666667, ans=0.125 2023-11-18 20:02:36,726 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10000, loss[loss=0.06732, simple_loss=0.06802, pruned_loss=0.01765, audio_tagging_loss=0.01567, over 15983.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1172, pruned_loss=0.03175, audio_tagging_loss=0.01109, over 3047668.67 frames. ], batch size: 61, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:02:39,903 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2023-11-18 20:02:40,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=387293.3333333333, ans=0.2 2023-11-18 20:02:46,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-18 20:02:51,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=387360.0, ans=0.0 2023-11-18 20:02:57,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=387426.6666666667, ans=0.0 2023-11-18 20:03:12,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=387493.3333333333, ans=0.0 2023-11-18 20:03:14,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.95 vs. limit=12.0 2023-11-18 20:03:32,499 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10050, loss[loss=0.1007, simple_loss=0.1119, pruned_loss=0.0329, audio_tagging_loss=0.0118, over 14896.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1162, pruned_loss=0.03141, audio_tagging_loss=0.01121, over 3047916.73 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:03:33,530 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 9.098e+01 9.898e+01 1.122e+02 1.719e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-18 20:03:43,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=387693.3333333333, ans=0.0 2023-11-18 20:03:49,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=387693.3333333333, ans=0.125 2023-11-18 20:03:56,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=12.0 2023-11-18 20:04:10,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=387826.6666666667, ans=0.1 2023-11-18 20:04:11,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=387826.6666666667, ans=0.125 2023-11-18 20:04:17,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=387893.3333333333, ans=0.09899494936611666 2023-11-18 20:04:23,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=387893.3333333333, ans=0.125 2023-11-18 20:04:24,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=387893.3333333333, ans=0.0 2023-11-18 20:04:28,973 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10100, loss[loss=0.09783, simple_loss=0.1171, pruned_loss=0.03045, audio_tagging_loss=0.00881, over 16364.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1169, pruned_loss=0.03156, audio_tagging_loss=0.01117, over 3050138.26 frames. ], batch size: 60, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:04:30,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=387960.0, ans=0.0 2023-11-18 20:04:43,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388026.6666666667, ans=0.1 2023-11-18 20:04:44,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=22.5 2023-11-18 20:05:04,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=388160.0, ans=0.125 2023-11-18 20:05:12,175 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:05:13,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388226.6666666667, ans=0.1 2023-11-18 20:05:13,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=388226.6666666667, ans=0.125 2023-11-18 20:05:18,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-11-18 20:05:23,847 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10150, loss[loss=0.1327, simple_loss=0.1573, pruned_loss=0.04272, audio_tagging_loss=0.01132, over 15628.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1171, pruned_loss=0.03156, audio_tagging_loss=0.01126, over 3054427.52 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:05:24,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.203e+01 1.000e+02 1.096e+02 2.259e+02, threshold=2.001e+02, percent-clipped=1.0 2023-11-18 20:05:43,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=388360.0, ans=0.2 2023-11-18 20:05:47,618 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:06:05,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=388493.3333333333, ans=0.0 2023-11-18 20:06:16,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=388560.0, ans=0.0 2023-11-18 20:06:18,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2023-11-18 20:06:19,302 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10200, loss[loss=0.08339, simple_loss=0.1001, pruned_loss=0.02403, audio_tagging_loss=0.009311, over 14845.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1168, pruned_loss=0.03152, audio_tagging_loss=0.01135, over 3056383.82 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:06:21,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=388626.6666666667, ans=0.07 2023-11-18 20:06:40,043 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:06:42,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=388760.0, ans=0.1 2023-11-18 20:06:55,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=388826.6666666667, ans=0.125 2023-11-18 20:07:14,929 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10250, loss[loss=0.09721, simple_loss=0.1104, pruned_loss=0.03013, audio_tagging_loss=0.01187, over 15168.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1169, pruned_loss=0.0318, audio_tagging_loss=0.01137, over 3051474.76 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:07:15,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=388960.0, ans=0.1 2023-11-18 20:07:15,959 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 9.102e+01 9.857e+01 1.065e+02 1.324e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-18 20:07:17,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=388960.0, ans=0.125 2023-11-18 20:07:18,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388960.0, ans=0.1 2023-11-18 20:07:24,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=388960.0, ans=0.125 2023-11-18 20:07:34,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=389026.6666666667, ans=0.0 2023-11-18 20:07:35,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=389026.6666666667, ans=0.09899494936611666 2023-11-18 20:07:41,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.21 vs. limit=6.0 2023-11-18 20:07:56,003 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:08:02,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=389226.6666666667, ans=0.125 2023-11-18 20:08:05,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.31 vs. limit=15.0 2023-11-18 20:08:11,208 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10300, loss[loss=0.11, simple_loss=0.129, pruned_loss=0.03281, audio_tagging_loss=0.01266, over 15578.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1164, pruned_loss=0.03168, audio_tagging_loss=0.01145, over 3045577.99 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:08:22,090 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2023-11-18 20:08:27,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-11-18 20:08:52,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389493.3333333333, ans=0.1 2023-11-18 20:09:07,702 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10350, loss[loss=0.122, simple_loss=0.1254, pruned_loss=0.04735, audio_tagging_loss=0.01192, over 14952.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1178, pruned_loss=0.03204, audio_tagging_loss=0.01147, over 3044790.80 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:09:08,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 9.314e+01 1.056e+02 1.157e+02 1.834e+02, threshold=2.113e+02, percent-clipped=0.0 2023-11-18 20:09:19,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=389693.3333333333, ans=0.125 2023-11-18 20:09:29,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=389760.0, ans=0.125 2023-11-18 20:09:52,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.84 vs. limit=15.0 2023-11-18 20:09:55,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=389893.3333333333, ans=0.125 2023-11-18 20:09:58,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=389893.3333333333, ans=0.125 2023-11-18 20:10:02,905 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10400, loss[loss=0.1029, simple_loss=0.1197, pruned_loss=0.03099, audio_tagging_loss=0.01209, over 14859.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1174, pruned_loss=0.03222, audio_tagging_loss=0.01162, over 3042385.50 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:10:25,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390093.3333333333, ans=0.1 2023-11-18 20:10:30,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=390093.3333333333, ans=0.0 2023-11-18 20:10:31,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=390093.3333333333, ans=0.125 2023-11-18 20:10:39,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=22.5 2023-11-18 20:10:59,426 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10450, loss[loss=0.09257, simple_loss=0.1088, pruned_loss=0.02963, audio_tagging_loss=0.008556, over 15102.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1175, pruned_loss=0.03234, audio_tagging_loss=0.01162, over 3042225.64 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:11:00,428 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.809e+01 9.608e+01 1.086e+02 1.646e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-18 20:11:12,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=390360.0, ans=0.0 2023-11-18 20:11:15,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2023-11-18 20:11:28,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=390426.6666666667, ans=0.1 2023-11-18 20:11:41,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=390493.3333333333, ans=0.125 2023-11-18 20:11:55,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=390626.6666666667, ans=0.125 2023-11-18 20:11:55,867 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10500, loss[loss=0.09901, simple_loss=0.103, pruned_loss=0.03349, audio_tagging_loss=0.014, over 14827.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1163, pruned_loss=0.03193, audio_tagging_loss=0.01145, over 3039227.06 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:11:58,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=390626.6666666667, ans=0.125 2023-11-18 20:12:03,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=12.0 2023-11-18 20:12:16,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=390693.3333333333, ans=0.125 2023-11-18 20:12:19,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=390760.0, ans=0.125 2023-11-18 20:12:28,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=390826.6666666667, ans=0.125 2023-11-18 20:12:28,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=15.0 2023-11-18 20:12:33,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=390826.6666666667, ans=0.0 2023-11-18 20:12:33,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=390826.6666666667, ans=0.125 2023-11-18 20:12:40,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2023-11-18 20:12:51,608 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10550, loss[loss=0.1014, simple_loss=0.1237, pruned_loss=0.02973, audio_tagging_loss=0.009833, over 15360.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.117, pruned_loss=0.03204, audio_tagging_loss=0.01117, over 3041438.20 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:12:52,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.618e+01 8.716e+01 9.677e+01 1.046e+02 1.546e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-18 20:12:56,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=390960.0, ans=0.125 2023-11-18 20:12:58,836 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.57 vs. limit=22.5 2023-11-18 20:13:11,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=391026.6666666667, ans=0.2 2023-11-18 20:13:47,309 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10600, loss[loss=0.1002, simple_loss=0.1117, pruned_loss=0.03211, audio_tagging_loss=0.01218, over 16576.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1177, pruned_loss=0.03216, audio_tagging_loss=0.01113, over 3044282.65 frames. ], batch size: 61, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:13:48,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=391293.3333333333, ans=0.0 2023-11-18 20:13:56,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=391293.3333333333, ans=0.125 2023-11-18 20:14:05,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=391360.0, ans=0.125 2023-11-18 20:14:06,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=391360.0, ans=0.2 2023-11-18 20:14:28,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=391493.3333333333, ans=0.125 2023-11-18 20:14:28,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=391493.3333333333, ans=0.125 2023-11-18 20:14:31,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=391560.0, ans=0.125 2023-11-18 20:14:33,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=391560.0, ans=0.125 2023-11-18 20:14:43,658 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10650, loss[loss=0.1157, simple_loss=0.1386, pruned_loss=0.03908, audio_tagging_loss=0.007293, over 14378.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1178, pruned_loss=0.03215, audio_tagging_loss=0.01107, over 3039166.45 frames. ], batch size: 53, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:14:44,671 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 9.141e+01 1.015e+02 1.176e+02 1.580e+02, threshold=2.030e+02, percent-clipped=0.0 2023-11-18 20:14:57,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=391693.3333333333, ans=0.0 2023-11-18 20:15:05,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.05 vs. limit=10.0 2023-11-18 20:15:08,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391760.0, ans=0.1 2023-11-18 20:15:14,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=391760.0, ans=0.5 2023-11-18 20:15:23,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=391826.6666666667, ans=0.07 2023-11-18 20:15:25,422 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2023-11-18 20:15:38,740 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10700, loss[loss=0.1031, simple_loss=0.1222, pruned_loss=0.03362, audio_tagging_loss=0.008409, over 14982.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1182, pruned_loss=0.03211, audio_tagging_loss=0.01104, over 3040640.28 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:15:52,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-18 20:15:54,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.97 vs. limit=15.0 2023-11-18 20:16:12,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=15.0 2023-11-18 20:16:19,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=392160.0, ans=0.2 2023-11-18 20:16:22,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392160.0, ans=0.1 2023-11-18 20:16:24,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=392226.6666666667, ans=0.0 2023-11-18 20:16:35,674 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10750, loss[loss=0.09426, simple_loss=0.1025, pruned_loss=0.02728, audio_tagging_loss=0.01572, over 14603.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1171, pruned_loss=0.03173, audio_tagging_loss=0.01105, over 3047094.68 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:16:36,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 9.086e+01 9.851e+01 1.129e+02 1.490e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 20:16:42,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-11-18 20:16:51,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=392360.0, ans=0.0 2023-11-18 20:16:51,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=392360.0, ans=0.0 2023-11-18 20:16:52,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=392360.0, ans=0.125 2023-11-18 20:16:54,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=392360.0, ans=0.125 2023-11-18 20:17:21,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=392560.0, ans=0.04949747468305833 2023-11-18 20:17:26,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2023-11-18 20:17:29,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=392560.0, ans=0.0 2023-11-18 20:17:31,498 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10800, loss[loss=0.07903, simple_loss=0.08547, pruned_loss=0.02506, audio_tagging_loss=0.01123, over 15644.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1158, pruned_loss=0.0315, audio_tagging_loss=0.01102, over 3049379.79 frames. ], batch size: 62, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:18:25,693 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:18:25,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=392893.3333333333, ans=0.2 2023-11-18 20:18:27,599 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10850, loss[loss=0.08059, simple_loss=0.09878, pruned_loss=0.02044, audio_tagging_loss=0.01076, over 14153.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.117, pruned_loss=0.03181, audio_tagging_loss=0.01104, over 3045523.54 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:18:28,582 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 9.217e+01 1.010e+02 1.123e+02 1.956e+02, threshold=2.020e+02, percent-clipped=0.0 2023-11-18 20:18:35,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=392960.0, ans=0.05 2023-11-18 20:18:54,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=393093.3333333333, ans=0.125 2023-11-18 20:19:00,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=393160.0, ans=0.1 2023-11-18 20:19:00,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=393160.0, ans=0.125 2023-11-18 20:19:14,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=393226.6666666667, ans=0.02 2023-11-18 20:19:19,163 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:19:19,558 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.74 vs. limit=22.5 2023-11-18 20:19:20,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=393226.6666666667, ans=0.125 2023-11-18 20:19:23,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=393293.3333333333, ans=0.2 2023-11-18 20:19:24,008 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10900, loss[loss=0.1139, simple_loss=0.1271, pruned_loss=0.04003, audio_tagging_loss=0.01036, over 15357.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.117, pruned_loss=0.03183, audio_tagging_loss=0.01121, over 3048111.45 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:19:28,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2023-11-18 20:19:29,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=393293.3333333333, ans=0.125 2023-11-18 20:20:10,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=393560.0, ans=0.125 2023-11-18 20:20:13,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=393560.0, ans=0.0 2023-11-18 20:20:20,065 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 10950, loss[loss=0.1155, simple_loss=0.1312, pruned_loss=0.04042, audio_tagging_loss=0.009439, over 15134.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1166, pruned_loss=0.03177, audio_tagging_loss=0.01121, over 3052791.84 frames. ], batch size: 54, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:20:21,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 9.174e+01 1.016e+02 1.114e+02 1.629e+02, threshold=2.031e+02, percent-clipped=0.0 2023-11-18 20:20:30,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=393693.3333333333, ans=0.125 2023-11-18 20:20:59,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.77 vs. limit=22.5 2023-11-18 20:21:11,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-11-18 20:21:15,286 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11000, loss[loss=0.1195, simple_loss=0.1263, pruned_loss=0.03918, audio_tagging_loss=0.01719, over 13579.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.116, pruned_loss=0.03155, audio_tagging_loss=0.01135, over 3049528.10 frames. ], batch size: 53, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:21:15,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=393960.0, ans=0.0 2023-11-18 20:21:20,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=393960.0, ans=0.125 2023-11-18 20:21:21,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=393960.0, ans=0.125 2023-11-18 20:21:23,339 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:21:27,849 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-18 20:21:29,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=394026.6666666667, ans=0.125 2023-11-18 20:21:53,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.5 2023-11-18 20:22:00,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=394226.6666666667, ans=0.125 2023-11-18 20:22:11,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=394293.3333333333, ans=0.0 2023-11-18 20:22:12,139 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11050, loss[loss=0.1097, simple_loss=0.1326, pruned_loss=0.0323, audio_tagging_loss=0.01107, over 14946.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1163, pruned_loss=0.03167, audio_tagging_loss=0.01137, over 3052610.12 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:22:13,192 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.478e+01 1.012e+02 1.085e+02 1.543e+02, threshold=2.025e+02, percent-clipped=0.0 2023-11-18 20:22:30,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=394360.0, ans=0.1 2023-11-18 20:22:53,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=394493.3333333333, ans=0.125 2023-11-18 20:23:07,218 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11100, loss[loss=0.1084, simple_loss=0.1267, pruned_loss=0.03371, audio_tagging_loss=0.01134, over 14579.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1159, pruned_loss=0.03141, audio_tagging_loss=0.01155, over 3048007.30 frames. ], batch size: 55, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:23:19,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=394693.3333333333, ans=0.2 2023-11-18 20:23:23,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=394693.3333333333, ans=0.95 2023-11-18 20:23:39,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=394760.0, ans=0.025 2023-11-18 20:23:42,165 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2023-11-18 20:23:44,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=394826.6666666667, ans=0.125 2023-11-18 20:24:01,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=394893.3333333333, ans=0.0 2023-11-18 20:24:03,468 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11150, loss[loss=0.08704, simple_loss=0.09639, pruned_loss=0.02515, audio_tagging_loss=0.0137, over 14961.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1164, pruned_loss=0.03163, audio_tagging_loss=0.01159, over 3046036.46 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:24:04,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 9.395e+01 1.022e+02 1.169e+02 1.423e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 20:24:06,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.29 vs. limit=10.0 2023-11-18 20:24:07,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=394960.0, ans=0.125 2023-11-18 20:24:07,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=394960.0, ans=0.125 2023-11-18 20:24:10,260 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=22.5 2023-11-18 20:24:16,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395026.6666666667, ans=0.1 2023-11-18 20:24:29,020 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2023-11-18 20:24:35,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395093.3333333333, ans=0.1 2023-11-18 20:24:59,121 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11200, loss[loss=0.09346, simple_loss=0.1188, pruned_loss=0.02484, audio_tagging_loss=0.009227, over 15179.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.116, pruned_loss=0.03155, audio_tagging_loss=0.01166, over 3049810.52 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:25:01,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=395293.3333333333, ans=0.2 2023-11-18 20:25:13,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=395360.0, ans=0.125 2023-11-18 20:25:17,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=395360.0, ans=0.0 2023-11-18 20:25:18,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-18 20:25:33,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=395493.3333333333, ans=0.07 2023-11-18 20:25:55,415 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11250, loss[loss=0.09807, simple_loss=0.1029, pruned_loss=0.03454, audio_tagging_loss=0.01208, over 15717.00 frames. ], tot_loss[loss=0.1, simple_loss=0.1145, pruned_loss=0.0312, audio_tagging_loss=0.01159, over 3044828.67 frames. ], batch size: 60, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:25:55,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=395626.6666666667, ans=0.125 2023-11-18 20:25:56,448 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 9.426e+01 1.024e+02 1.146e+02 1.822e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 20:26:14,072 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:26:16,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=395760.0, ans=0.125 2023-11-18 20:26:19,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=395760.0, ans=0.0 2023-11-18 20:26:23,433 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2023-11-18 20:26:45,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=395893.3333333333, ans=0.0 2023-11-18 20:26:48,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=395893.3333333333, ans=0.125 2023-11-18 20:26:50,727 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11300, loss[loss=0.1394, simple_loss=0.1592, pruned_loss=0.052, audio_tagging_loss=0.007827, over 15405.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1159, pruned_loss=0.03156, audio_tagging_loss=0.01139, over 3042085.15 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:27:01,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-18 20:27:04,060 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=396026.6666666667, ans=0.0 2023-11-18 20:27:14,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=396093.3333333333, ans=0.125 2023-11-18 20:27:44,343 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2023-11-18 20:27:45,762 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11350, loss[loss=0.1089, simple_loss=0.1305, pruned_loss=0.02847, audio_tagging_loss=0.01516, over 16430.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1154, pruned_loss=0.03103, audio_tagging_loss=0.01143, over 3040877.27 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:27:46,826 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 9.361e+01 1.045e+02 1.135e+02 1.699e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 20:27:51,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=396293.3333333333, ans=0.2 2023-11-18 20:27:51,675 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-11-18 20:28:09,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=396426.6666666667, ans=0.125 2023-11-18 20:28:13,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=396426.6666666667, ans=0.0 2023-11-18 20:28:23,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=396493.3333333333, ans=0.125 2023-11-18 20:28:23,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=396493.3333333333, ans=0.0 2023-11-18 20:28:26,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=396493.3333333333, ans=0.1 2023-11-18 20:28:28,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=396560.0, ans=0.0 2023-11-18 20:28:31,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=396560.0, ans=0.125 2023-11-18 20:28:38,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=396560.0, ans=0.0 2023-11-18 20:28:42,015 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11400, loss[loss=0.1105, simple_loss=0.1264, pruned_loss=0.0384, audio_tagging_loss=0.008885, over 14997.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1168, pruned_loss=0.0316, audio_tagging_loss=0.01131, over 3045404.80 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:28:52,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=396693.3333333333, ans=0.125 2023-11-18 20:29:10,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=396760.0, ans=0.125 2023-11-18 20:29:37,081 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11450, loss[loss=0.1292, simple_loss=0.1502, pruned_loss=0.04522, audio_tagging_loss=0.008842, over 16709.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1166, pruned_loss=0.03178, audio_tagging_loss=0.01122, over 3048388.67 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:29:38,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.945e+01 1.000e+02 1.081e+02 1.401e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-18 20:29:39,695 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2023-11-18 20:29:40,624 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:30:03,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=397093.3333333333, ans=0.0 2023-11-18 20:30:09,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=397160.0, ans=0.125 2023-11-18 20:30:16,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=397160.0, ans=0.125 2023-11-18 20:30:29,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=397226.6666666667, ans=0.015 2023-11-18 20:30:32,398 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11500, loss[loss=0.09764, simple_loss=0.1156, pruned_loss=0.03026, audio_tagging_loss=0.009586, over 15506.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1163, pruned_loss=0.03158, audio_tagging_loss=0.01125, over 3045087.14 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:30:45,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=397360.0, ans=0.0 2023-11-18 20:30:58,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397426.6666666667, ans=0.1 2023-11-18 20:31:29,298 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11550, loss[loss=0.1012, simple_loss=0.1253, pruned_loss=0.02912, audio_tagging_loss=0.009468, over 16084.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1155, pruned_loss=0.03117, audio_tagging_loss=0.01126, over 3047050.96 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:31:30,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.927e+01 9.792e+01 1.098e+02 1.308e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-18 20:31:36,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=397626.6666666667, ans=0.1 2023-11-18 20:31:56,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=397760.0, ans=0.125 2023-11-18 20:32:00,554 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:32:19,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=397893.3333333333, ans=0.125 2023-11-18 20:32:22,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=397893.3333333333, ans=0.125 2023-11-18 20:32:24,908 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11600, loss[loss=0.09703, simple_loss=0.1114, pruned_loss=0.03037, audio_tagging_loss=0.01095, over 14414.00 frames. ], tot_loss[loss=0.1, simple_loss=0.1152, pruned_loss=0.03115, audio_tagging_loss=0.01128, over 3045836.46 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 64.0 2023-11-18 20:32:27,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.36 vs. limit=15.0 2023-11-18 20:33:20,115 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11650, loss[loss=0.1009, simple_loss=0.1183, pruned_loss=0.03235, audio_tagging_loss=0.009428, over 15392.00 frames. ], tot_loss[loss=0.09994, simple_loss=0.1151, pruned_loss=0.03111, audio_tagging_loss=0.0113, over 3046929.27 frames. ], batch size: 58, lr: 1.29e-02, grad_scale: 64.0 2023-11-18 20:33:21,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.987e+01 1.026e+02 1.150e+02 1.533e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 20:33:21,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=398293.3333333333, ans=0.125 2023-11-18 20:33:23,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=398293.3333333333, ans=0.125 2023-11-18 20:33:34,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398360.0, ans=0.1 2023-11-18 20:33:48,142 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.33 vs. limit=15.0 2023-11-18 20:33:51,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=398426.6666666667, ans=0.125 2023-11-18 20:34:16,073 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11700, loss[loss=0.1317, simple_loss=0.161, pruned_loss=0.04089, audio_tagging_loss=0.01029, over 15879.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1159, pruned_loss=0.03137, audio_tagging_loss=0.01135, over 3049128.46 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:34:34,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=398693.3333333333, ans=0.125 2023-11-18 20:34:39,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2023-11-18 20:34:41,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=398760.0, ans=0.125 2023-11-18 20:35:12,935 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11750, loss[loss=0.1068, simple_loss=0.1207, pruned_loss=0.03451, audio_tagging_loss=0.0119, over 15085.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1161, pruned_loss=0.03175, audio_tagging_loss=0.0114, over 3053396.11 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:35:15,031 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.870e+01 9.922e+01 1.106e+02 1.477e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-18 20:35:16,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=398960.0, ans=0.125 2023-11-18 20:35:38,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=399093.3333333333, ans=0.125 2023-11-18 20:35:40,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=399093.3333333333, ans=0.125 2023-11-18 20:35:44,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.26 vs. limit=10.0 2023-11-18 20:35:47,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399160.0, ans=0.1 2023-11-18 20:36:04,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=399226.6666666667, ans=0.125 2023-11-18 20:36:08,088 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11800, loss[loss=0.07891, simple_loss=0.0839, pruned_loss=0.02116, audio_tagging_loss=0.0158, over 14676.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1148, pruned_loss=0.03139, audio_tagging_loss=0.01151, over 3048398.50 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:36:11,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=399293.3333333333, ans=10.0 2023-11-18 20:36:15,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=399293.3333333333, ans=0.2 2023-11-18 20:36:23,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399360.0, ans=0.1 2023-11-18 20:36:29,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2023-11-18 20:36:37,992 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:36:38,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=399426.6666666667, ans=0.5 2023-11-18 20:37:04,187 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11850, loss[loss=0.07237, simple_loss=0.0802, pruned_loss=0.01921, audio_tagging_loss=0.01306, over 15922.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1151, pruned_loss=0.0315, audio_tagging_loss=0.01148, over 3043278.26 frames. ], batch size: 64, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:37:06,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.830e+01 9.778e+01 1.086e+02 1.428e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 20:37:21,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=399693.3333333333, ans=0.125 2023-11-18 20:37:28,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=399760.0, ans=0.0 2023-11-18 20:37:41,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=399826.6666666667, ans=0.0 2023-11-18 20:37:58,849 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11900, loss[loss=0.09451, simple_loss=0.1107, pruned_loss=0.02816, audio_tagging_loss=0.01103, over 16540.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1161, pruned_loss=0.03179, audio_tagging_loss=0.01154, over 3041580.34 frames. ], batch size: 62, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:38:20,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-18 20:38:27,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=400093.3333333333, ans=0.125 2023-11-18 20:38:44,159 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2023-11-18 20:38:47,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=400226.6666666667, ans=0.0 2023-11-18 20:38:49,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=400226.6666666667, ans=10.0 2023-11-18 20:38:56,557 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 11950, loss[loss=0.08794, simple_loss=0.09585, pruned_loss=0.02844, audio_tagging_loss=0.01157, over 14437.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1149, pruned_loss=0.03141, audio_tagging_loss=0.01162, over 3043524.81 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:38:58,616 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.829e+01 9.865e+01 1.129e+02 1.573e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 20:39:34,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=400493.3333333333, ans=0.125 2023-11-18 20:39:35,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=400493.3333333333, ans=0.125 2023-11-18 20:39:40,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-11-18 20:39:50,216 INFO [train_asr.py:1115] (3/4) Epoch 5, batch 12000, loss[loss=0.08796, simple_loss=0.09918, pruned_loss=0.02319, audio_tagging_loss=0.01518, over 14870.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1159, pruned_loss=0.0316, audio_tagging_loss=0.01166, over 3045500.07 frames. ], batch size: 55, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:39:50,217 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 20:40:03,762 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.4736, 3.2700, 2.0232, 3.0529], device='cuda:3') 2023-11-18 20:40:07,090 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.3591, 4.0898, 3.6667, 3.1397], device='cuda:3') 2023-11-18 20:40:23,256 INFO [train_asr.py:1147] (3/4) Epoch 5, validation: loss=0.07195, simple_loss=0.05986, pruned_loss=0.008725, audio_tagging_loss=0.0333, over 4681554.00 frames. 2023-11-18 20:40:23,257 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 20:40:26,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=400626.6666666667, ans=0.125 2023-11-18 20:40:37,881 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-18 20:40:39,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400693.3333333333, ans=0.1 2023-11-18 20:41:23,817 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 0, loss[loss=0.09326, simple_loss=0.07738, pruned_loss=0.02022, audio_tagging_loss=0.03435, over 15118.00 frames. ], tot_loss[loss=0.09326, simple_loss=0.07738, pruned_loss=0.02022, audio_tagging_loss=0.03435, over 15118.00 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:41:23,817 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 20:41:41,433 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7379, 5.7620, 5.8237, 5.9164], device='cuda:3') 2023-11-18 20:41:55,532 INFO [train_asr.py:1147] (3/4) Epoch 6, validation: loss=0.07069, simple_loss=0.05989, pruned_loss=0.008764, audio_tagging_loss=0.03198, over 4681554.00 frames. 2023-11-18 20:41:55,532 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 20:41:59,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=400780.0, ans=0.0 2023-11-18 20:42:09,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=400846.6666666667, ans=0.125 2023-11-18 20:42:25,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=400913.3333333333, ans=0.125 2023-11-18 20:42:27,001 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 9.356e+01 1.020e+02 1.152e+02 1.600e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 20:42:32,320 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.79 vs. limit=10.0 2023-11-18 20:42:38,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=401046.6666666667, ans=0.125 2023-11-18 20:42:50,316 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 50, loss[loss=0.104, simple_loss=0.1147, pruned_loss=0.02518, audio_tagging_loss=0.02145, over 15324.00 frames. ], tot_loss[loss=0.106, simple_loss=0.1086, pruned_loss=0.0291, audio_tagging_loss=0.02256, over 692994.66 frames. ], batch size: 54, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:42:57,741 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2023-11-18 20:43:04,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=401180.0, ans=0.0 2023-11-18 20:43:07,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=401180.0, ans=0.0 2023-11-18 20:43:23,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=401313.3333333333, ans=0.5 2023-11-18 20:43:27,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=401313.3333333333, ans=0.1 2023-11-18 20:43:30,091 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=22.5 2023-11-18 20:43:33,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401313.3333333333, ans=0.1 2023-11-18 20:43:44,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.82 vs. limit=10.0 2023-11-18 20:43:47,289 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 100, loss[loss=0.122, simple_loss=0.1281, pruned_loss=0.03742, audio_tagging_loss=0.02058, over 15663.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1124, pruned_loss=0.03015, audio_tagging_loss=0.02121, over 1218182.81 frames. ], batch size: 56, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:43:54,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.51 vs. limit=15.0 2023-11-18 20:44:14,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=401580.0, ans=0.0 2023-11-18 20:44:19,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.166e+01 9.950e+01 1.092e+02 1.419e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-18 20:44:26,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=401646.6666666667, ans=0.0 2023-11-18 20:44:27,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=401646.6666666667, ans=0.04949747468305833 2023-11-18 20:44:33,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=401713.3333333333, ans=0.1 2023-11-18 20:44:33,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2023-11-18 20:44:43,066 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 150, loss[loss=0.08511, simple_loss=0.09655, pruned_loss=0.02284, audio_tagging_loss=0.014, over 14041.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1147, pruned_loss=0.03029, audio_tagging_loss=0.01869, over 1624248.35 frames. ], batch size: 55, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:44:43,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-18 20:45:04,941 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=22.5 2023-11-18 20:45:24,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=401980.0, ans=0.0 2023-11-18 20:45:31,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=402046.6666666667, ans=0.05 2023-11-18 20:45:39,163 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 200, loss[loss=0.07352, simple_loss=0.0958, pruned_loss=0.01666, audio_tagging_loss=0.008963, over 14833.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1141, pruned_loss=0.03021, audio_tagging_loss=0.0166, over 1940841.45 frames. ], batch size: 55, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:45:41,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=402113.3333333333, ans=0.2 2023-11-18 20:45:50,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=402180.0, ans=0.0 2023-11-18 20:45:58,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=402180.0, ans=0.09899494936611666 2023-11-18 20:46:06,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=402246.6666666667, ans=0.125 2023-11-18 20:46:11,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 9.007e+01 1.004e+02 1.088e+02 1.464e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 20:46:14,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=402313.3333333333, ans=0.125 2023-11-18 20:46:17,044 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2023-11-18 20:46:35,570 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 250, loss[loss=0.107, simple_loss=0.1174, pruned_loss=0.03786, audio_tagging_loss=0.01048, over 15130.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1147, pruned_loss=0.03031, audio_tagging_loss=0.01494, over 2182720.07 frames. ], batch size: 55, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:46:43,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=402446.6666666667, ans=0.0 2023-11-18 20:46:50,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=402513.3333333333, ans=0.125 2023-11-18 20:46:52,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=402513.3333333333, ans=0.125 2023-11-18 20:46:54,812 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=12.0 2023-11-18 20:46:56,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=402580.0, ans=0.125 2023-11-18 20:46:57,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=402580.0, ans=0.0 2023-11-18 20:47:03,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402580.0, ans=0.1 2023-11-18 20:47:12,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=402646.6666666667, ans=0.0 2023-11-18 20:47:21,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=402713.3333333333, ans=0.0 2023-11-18 20:47:31,866 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 300, loss[loss=0.0972, simple_loss=0.1169, pruned_loss=0.02549, audio_tagging_loss=0.01327, over 14469.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1147, pruned_loss=0.03076, audio_tagging_loss=0.01395, over 2374296.54 frames. ], batch size: 53, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:47:34,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=402780.0, ans=0.1 2023-11-18 20:47:35,648 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.28 vs. limit=15.0 2023-11-18 20:47:55,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=402913.3333333333, ans=0.125 2023-11-18 20:47:59,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=402913.3333333333, ans=0.125 2023-11-18 20:48:03,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 9.372e+01 1.051e+02 1.173e+02 1.706e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 20:48:06,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=402980.0, ans=0.125 2023-11-18 20:48:10,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=402980.0, ans=0.125 2023-11-18 20:48:27,619 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 350, loss[loss=0.1024, simple_loss=0.1144, pruned_loss=0.03297, audio_tagging_loss=0.01218, over 16276.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1166, pruned_loss=0.03112, audio_tagging_loss=0.01297, over 2532233.88 frames. ], batch size: 59, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:48:45,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=403180.0, ans=0.0 2023-11-18 20:48:46,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=403180.0, ans=0.0 2023-11-18 20:48:52,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403246.6666666667, ans=0.1 2023-11-18 20:49:01,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403313.3333333333, ans=0.1 2023-11-18 20:49:07,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=403313.3333333333, ans=0.0 2023-11-18 20:49:10,061 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=12.0 2023-11-18 20:49:18,814 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.762e-01 2023-11-18 20:49:23,918 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 400, loss[loss=0.1118, simple_loss=0.1345, pruned_loss=0.03692, audio_tagging_loss=0.007649, over 14741.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1172, pruned_loss=0.03121, audio_tagging_loss=0.01246, over 2652640.90 frames. ], batch size: 55, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:49:37,303 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:49:40,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=403513.3333333333, ans=0.125 2023-11-18 20:49:43,916 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-11-18 20:49:55,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 9.366e+01 1.079e+02 1.287e+02 1.849e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 20:50:00,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=403646.6666666667, ans=0.0 2023-11-18 20:50:19,393 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 450, loss[loss=0.09931, simple_loss=0.1201, pruned_loss=0.02776, audio_tagging_loss=0.01147, over 15687.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1152, pruned_loss=0.03086, audio_tagging_loss=0.01208, over 2738732.19 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:50:30,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=403846.6666666667, ans=15.0 2023-11-18 20:50:35,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=403846.6666666667, ans=0.125 2023-11-18 20:51:07,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404046.6666666667, ans=0.1 2023-11-18 20:51:15,173 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 500, loss[loss=0.1003, simple_loss=0.1215, pruned_loss=0.03013, audio_tagging_loss=0.009409, over 15042.00 frames. ], tot_loss[loss=0.09949, simple_loss=0.1142, pruned_loss=0.03053, audio_tagging_loss=0.01188, over 2808929.29 frames. ], batch size: 54, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:51:47,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.724e+01 9.545e+01 1.075e+02 1.901e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-18 20:51:59,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2023-11-18 20:52:08,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-18 20:52:11,361 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 550, loss[loss=0.0542, simple_loss=0.05071, pruned_loss=0.0159, audio_tagging_loss=0.01295, over 14086.00 frames. ], tot_loss[loss=0.09986, simple_loss=0.1149, pruned_loss=0.03065, audio_tagging_loss=0.01177, over 2857299.63 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:52:32,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=404580.0, ans=0.125 2023-11-18 20:52:38,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=404580.0, ans=0.0 2023-11-18 20:52:45,149 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-18 20:52:53,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=404646.6666666667, ans=0.125 2023-11-18 20:53:00,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=404713.3333333333, ans=0.0 2023-11-18 20:53:01,067 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:53:03,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404713.3333333333, ans=0.1 2023-11-18 20:53:04,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.81 vs. limit=22.5 2023-11-18 20:53:06,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=404780.0, ans=0.125 2023-11-18 20:53:07,246 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 600, loss[loss=0.1117, simple_loss=0.1288, pruned_loss=0.03773, audio_tagging_loss=0.009611, over 14960.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1155, pruned_loss=0.03076, audio_tagging_loss=0.0116, over 2902105.15 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:53:32,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=404913.3333333333, ans=0.125 2023-11-18 20:53:40,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.597e+01 9.522e+01 1.046e+02 1.696e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-18 20:53:40,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=404980.0, ans=0.125 2023-11-18 20:53:44,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2023-11-18 20:53:53,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2023-11-18 20:54:03,256 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 650, loss[loss=0.1397, simple_loss=0.1679, pruned_loss=0.04667, audio_tagging_loss=0.009078, over 14965.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1159, pruned_loss=0.03077, audio_tagging_loss=0.01151, over 2935352.90 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:54:25,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=15.0 2023-11-18 20:54:31,778 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:54:41,224 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-18 20:54:59,366 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 700, loss[loss=0.1114, simple_loss=0.1296, pruned_loss=0.03422, audio_tagging_loss=0.01238, over 15148.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1162, pruned_loss=0.03087, audio_tagging_loss=0.0113, over 2969245.78 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:55:11,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=405513.3333333333, ans=0.125 2023-11-18 20:55:31,763 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 9.330e+01 1.028e+02 1.121e+02 2.477e+02, threshold=2.056e+02, percent-clipped=1.0 2023-11-18 20:55:37,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=405646.6666666667, ans=0.1 2023-11-18 20:55:44,062 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:55:55,650 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 750, loss[loss=0.1345, simple_loss=0.1447, pruned_loss=0.04779, audio_tagging_loss=0.01433, over 15681.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1184, pruned_loss=0.03148, audio_tagging_loss=0.0112, over 2998053.13 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:56:08,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405846.6666666667, ans=0.1 2023-11-18 20:56:10,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=405846.6666666667, ans=0.1 2023-11-18 20:56:19,839 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:56:39,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=406046.6666666667, ans=0.125 2023-11-18 20:56:42,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=406046.6666666667, ans=0.125 2023-11-18 20:56:44,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=406046.6666666667, ans=10.0 2023-11-18 20:56:51,384 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 800, loss[loss=0.09512, simple_loss=0.1137, pruned_loss=0.02828, audio_tagging_loss=0.009973, over 15114.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1172, pruned_loss=0.03109, audio_tagging_loss=0.01122, over 3016639.17 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:57:24,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 9.553e+01 1.008e+02 1.085e+02 1.896e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 20:57:32,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=406313.3333333333, ans=0.035 2023-11-18 20:57:40,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2023-11-18 20:57:46,585 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 850, loss[loss=0.09121, simple_loss=0.1099, pruned_loss=0.02526, audio_tagging_loss=0.01098, over 15300.00 frames. ], tot_loss[loss=0.09948, simple_loss=0.1153, pruned_loss=0.03044, audio_tagging_loss=0.01141, over 3027223.56 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:58:03,213 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:58:12,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=406580.0, ans=0.125 2023-11-18 20:58:15,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=406580.0, ans=0.125 2023-11-18 20:58:15,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=406580.0, ans=0.2 2023-11-18 20:58:18,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406580.0, ans=0.1 2023-11-18 20:58:38,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=406713.3333333333, ans=0.125 2023-11-18 20:58:43,495 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 900, loss[loss=0.1347, simple_loss=0.1648, pruned_loss=0.04622, audio_tagging_loss=0.006111, over 15175.00 frames. ], tot_loss[loss=0.09915, simple_loss=0.115, pruned_loss=0.03028, audio_tagging_loss=0.01139, over 3037348.20 frames. ], batch size: 53, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:58:48,017 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=406780.0, ans=0.04949747468305833 2023-11-18 20:58:53,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=406846.6666666667, ans=0.125 2023-11-18 20:58:56,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=406846.6666666667, ans=0.125 2023-11-18 20:59:13,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2023-11-18 20:59:15,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.915e+01 9.624e+01 1.067e+02 1.384e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-18 20:59:39,118 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 950, loss[loss=0.1102, simple_loss=0.1235, pruned_loss=0.0354, audio_tagging_loss=0.01302, over 15143.00 frames. ], tot_loss[loss=0.09919, simple_loss=0.115, pruned_loss=0.0303, audio_tagging_loss=0.0114, over 3042636.72 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:59:46,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=407113.3333333333, ans=0.125 2023-11-18 20:59:49,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=407180.0, ans=0.125 2023-11-18 20:59:51,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=407180.0, ans=0.09899494936611666 2023-11-18 20:59:55,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-11-18 21:00:17,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=407313.3333333333, ans=0.0 2023-11-18 21:00:18,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=407313.3333333333, ans=0.125 2023-11-18 21:00:34,304 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1000, loss[loss=0.09163, simple_loss=0.1054, pruned_loss=0.02622, audio_tagging_loss=0.01272, over 15420.00 frames. ], tot_loss[loss=0.09805, simple_loss=0.1138, pruned_loss=0.02985, audio_tagging_loss=0.0113, over 3039095.61 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:00:40,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=407446.6666666667, ans=0.125 2023-11-18 21:00:51,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=407513.3333333333, ans=0.1 2023-11-18 21:00:53,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=407513.3333333333, ans=0.0 2023-11-18 21:00:58,789 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:01:00,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=407580.0, ans=0.05 2023-11-18 21:01:07,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.744e+01 1.004e+02 1.144e+02 1.885e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-18 21:01:08,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=407646.6666666667, ans=0.2 2023-11-18 21:01:13,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2023-11-18 21:01:17,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=407646.6666666667, ans=0.2 2023-11-18 21:01:18,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=407713.3333333333, ans=0.125 2023-11-18 21:01:19,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=407713.3333333333, ans=0.125 2023-11-18 21:01:20,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=407713.3333333333, ans=0.125 2023-11-18 21:01:29,352 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2023-11-18 21:01:30,880 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1050, loss[loss=0.1102, simple_loss=0.1294, pruned_loss=0.03649, audio_tagging_loss=0.008994, over 15675.00 frames. ], tot_loss[loss=0.09854, simple_loss=0.1142, pruned_loss=0.03026, audio_tagging_loss=0.0112, over 3035715.66 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:01:46,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=22.5 2023-11-18 21:01:59,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=407913.3333333333, ans=0.125 2023-11-18 21:02:05,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=407980.0, ans=0.125 2023-11-18 21:02:18,485 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.69 vs. limit=15.0 2023-11-18 21:02:25,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=408046.6666666667, ans=0.2 2023-11-18 21:02:27,543 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1100, loss[loss=0.07818, simple_loss=0.09192, pruned_loss=0.02218, audio_tagging_loss=0.01004, over 15034.00 frames. ], tot_loss[loss=0.0973, simple_loss=0.1128, pruned_loss=0.02979, audio_tagging_loss=0.01112, over 3034271.83 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 16.0 2023-11-18 21:02:29,730 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:02:44,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=408180.0, ans=0.125 2023-11-18 21:02:48,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-18 21:02:52,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=408246.6666666667, ans=0.125 2023-11-18 21:02:52,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408246.6666666667, ans=0.1 2023-11-18 21:02:54,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=408246.6666666667, ans=0.125 2023-11-18 21:03:00,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.573e+01 9.716e+01 1.058e+02 1.424e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-18 21:03:08,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=408313.3333333333, ans=0.5 2023-11-18 21:03:22,703 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1150, loss[loss=0.1233, simple_loss=0.1492, pruned_loss=0.03955, audio_tagging_loss=0.009111, over 15482.00 frames. ], tot_loss[loss=0.09727, simple_loss=0.1129, pruned_loss=0.02977, audio_tagging_loss=0.01107, over 3041152.86 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 16.0 2023-11-18 21:03:24,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=12.0 2023-11-18 21:03:27,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=408446.6666666667, ans=0.125 2023-11-18 21:03:31,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.81 vs. limit=10.0 2023-11-18 21:03:42,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2023-11-18 21:03:55,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=408580.0, ans=0.0 2023-11-18 21:04:00,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=408646.6666666667, ans=0.125 2023-11-18 21:04:13,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.31 vs. limit=10.0 2023-11-18 21:04:19,246 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1200, loss[loss=0.07276, simple_loss=0.08218, pruned_loss=0.02057, audio_tagging_loss=0.0111, over 15457.00 frames. ], tot_loss[loss=0.09702, simple_loss=0.1127, pruned_loss=0.02964, audio_tagging_loss=0.01104, over 3039212.49 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:04:52,326 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 9.018e+01 9.709e+01 1.057e+02 1.336e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-18 21:05:15,226 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1250, loss[loss=0.09748, simple_loss=0.1056, pruned_loss=0.03313, audio_tagging_loss=0.01158, over 15091.00 frames. ], tot_loss[loss=0.0984, simple_loss=0.1142, pruned_loss=0.03035, audio_tagging_loss=0.01095, over 3041988.40 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:05:23,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=12.0 2023-11-18 21:05:25,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=409180.0, ans=0.0 2023-11-18 21:05:43,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=409246.6666666667, ans=0.125 2023-11-18 21:05:57,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=409313.3333333333, ans=0.0 2023-11-18 21:06:07,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=409380.0, ans=0.2 2023-11-18 21:06:11,359 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1300, loss[loss=0.09632, simple_loss=0.1153, pruned_loss=0.02763, audio_tagging_loss=0.01103, over 14610.00 frames. ], tot_loss[loss=0.09857, simple_loss=0.1148, pruned_loss=0.03026, audio_tagging_loss=0.01091, over 3041317.62 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:06:11,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=409446.6666666667, ans=0.0 2023-11-18 21:06:13,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=409446.6666666667, ans=0.0 2023-11-18 21:06:27,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-11-18 21:06:31,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=409513.3333333333, ans=0.2 2023-11-18 21:06:45,371 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.886e+01 9.349e+01 1.016e+02 1.502e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-18 21:06:51,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=409646.6666666667, ans=0.2 2023-11-18 21:06:52,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=409646.6666666667, ans=0.0 2023-11-18 21:06:54,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=409646.6666666667, ans=0.0 2023-11-18 21:07:07,820 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1350, loss[loss=0.1089, simple_loss=0.1266, pruned_loss=0.03535, audio_tagging_loss=0.01027, over 15047.00 frames. ], tot_loss[loss=0.09864, simple_loss=0.115, pruned_loss=0.03026, audio_tagging_loss=0.01088, over 3052222.12 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:07:10,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=409780.0, ans=0.125 2023-11-18 21:07:25,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=409846.6666666667, ans=0.1 2023-11-18 21:07:27,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=409846.6666666667, ans=0.0 2023-11-18 21:07:41,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=409980.0, ans=0.125 2023-11-18 21:07:47,873 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:08:03,666 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1400, loss[loss=0.09417, simple_loss=0.1173, pruned_loss=0.0266, audio_tagging_loss=0.008949, over 13549.00 frames. ], tot_loss[loss=0.09819, simple_loss=0.1143, pruned_loss=0.03007, audio_tagging_loss=0.01095, over 3047481.04 frames. ], batch size: 50, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:08:04,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=410113.3333333333, ans=0.0 2023-11-18 21:08:21,442 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:08:25,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=22.5 2023-11-18 21:08:37,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.960e+01 8.879e+01 9.810e+01 1.048e+02 1.417e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-18 21:08:38,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=410313.3333333333, ans=0.2 2023-11-18 21:08:59,555 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1450, loss[loss=0.108, simple_loss=0.1308, pruned_loss=0.03477, audio_tagging_loss=0.007804, over 15863.00 frames. ], tot_loss[loss=0.09817, simple_loss=0.1141, pruned_loss=0.03015, audio_tagging_loss=0.01097, over 3054934.16 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:09:12,208 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:09:20,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=410513.3333333333, ans=0.02 2023-11-18 21:09:33,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=410646.6666666667, ans=0.0 2023-11-18 21:09:56,062 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1500, loss[loss=0.08428, simple_loss=0.1004, pruned_loss=0.02054, audio_tagging_loss=0.01353, over 15852.00 frames. ], tot_loss[loss=0.09826, simple_loss=0.1142, pruned_loss=0.03004, audio_tagging_loss=0.0111, over 3054623.74 frames. ], batch size: 60, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:10:03,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.87 vs. limit=22.5 2023-11-18 21:10:10,147 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.56 vs. limit=10.0 2023-11-18 21:10:19,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410913.3333333333, ans=0.1 2023-11-18 21:10:29,754 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 8.852e+01 9.763e+01 1.053e+02 1.656e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-18 21:10:51,953 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1550, loss[loss=0.09685, simple_loss=0.1127, pruned_loss=0.03067, audio_tagging_loss=0.009851, over 14361.00 frames. ], tot_loss[loss=0.09908, simple_loss=0.1148, pruned_loss=0.03041, audio_tagging_loss=0.01125, over 3049077.28 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:11:01,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=411113.3333333333, ans=0.125 2023-11-18 21:11:26,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=411313.3333333333, ans=0.0 2023-11-18 21:11:26,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.95 vs. limit=10.0 2023-11-18 21:11:27,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=411313.3333333333, ans=0.125 2023-11-18 21:11:29,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=411313.3333333333, ans=0.125 2023-11-18 21:11:38,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=411380.0, ans=0.125 2023-11-18 21:11:47,825 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1600, loss[loss=0.09388, simple_loss=0.1036, pruned_loss=0.02686, audio_tagging_loss=0.01524, over 13778.00 frames. ], tot_loss[loss=0.09948, simple_loss=0.1155, pruned_loss=0.0304, audio_tagging_loss=0.01135, over 3052194.74 frames. ], batch size: 53, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:11:58,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=411513.3333333333, ans=0.125 2023-11-18 21:12:12,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2023-11-18 21:12:21,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.910e+01 9.772e+01 1.109e+02 1.512e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 21:12:44,088 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1650, loss[loss=0.1022, simple_loss=0.1283, pruned_loss=0.02816, audio_tagging_loss=0.009928, over 15699.00 frames. ], tot_loss[loss=0.09882, simple_loss=0.1147, pruned_loss=0.03002, audio_tagging_loss=0.01145, over 3057173.33 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:13:04,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=411846.6666666667, ans=0.0 2023-11-18 21:13:25,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.55 vs. limit=10.0 2023-11-18 21:13:30,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=412046.6666666667, ans=0.2 2023-11-18 21:13:39,902 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1700, loss[loss=0.09064, simple_loss=0.1085, pruned_loss=0.02399, audio_tagging_loss=0.0124, over 14802.00 frames. ], tot_loss[loss=0.09864, simple_loss=0.1141, pruned_loss=0.03, audio_tagging_loss=0.01161, over 3052906.52 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:13:47,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-18 21:13:52,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=412180.0, ans=0.125 2023-11-18 21:13:58,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=412180.0, ans=0.1 2023-11-18 21:14:01,391 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2023-11-18 21:14:13,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 9.349e+01 1.070e+02 1.315e+02 2.031e+02, threshold=2.140e+02, percent-clipped=2.0 2023-11-18 21:14:35,713 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1750, loss[loss=0.1162, simple_loss=0.143, pruned_loss=0.03755, audio_tagging_loss=0.007168, over 15260.00 frames. ], tot_loss[loss=0.09772, simple_loss=0.1131, pruned_loss=0.02962, audio_tagging_loss=0.01152, over 3056473.98 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:14:36,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.00 vs. limit=15.0 2023-11-18 21:14:37,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2023-11-18 21:15:17,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=412646.6666666667, ans=0.125 2023-11-18 21:15:20,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=412713.3333333333, ans=0.125 2023-11-18 21:15:31,634 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1800, loss[loss=0.08054, simple_loss=0.09745, pruned_loss=0.01979, audio_tagging_loss=0.01203, over 15550.00 frames. ], tot_loss[loss=0.09773, simple_loss=0.1134, pruned_loss=0.02976, audio_tagging_loss=0.01126, over 3051717.92 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:16:06,198 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.676e+01 9.052e+01 1.017e+02 1.096e+02 2.007e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 21:16:13,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=412980.0, ans=0.0 2023-11-18 21:16:17,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=413046.6666666667, ans=0.125 2023-11-18 21:16:27,505 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1850, loss[loss=0.0876, simple_loss=0.09084, pruned_loss=0.02965, audio_tagging_loss=0.01253, over 15143.00 frames. ], tot_loss[loss=0.09782, simple_loss=0.1134, pruned_loss=0.02984, audio_tagging_loss=0.01128, over 3049677.85 frames. ], batch size: 60, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:16:45,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=413180.0, ans=0.0 2023-11-18 21:16:49,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=413246.6666666667, ans=0.2 2023-11-18 21:16:51,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=413246.6666666667, ans=0.0 2023-11-18 21:17:02,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=413313.3333333333, ans=0.0 2023-11-18 21:17:12,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=413380.0, ans=0.0 2023-11-18 21:17:14,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=413380.0, ans=0.0 2023-11-18 21:17:17,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=413380.0, ans=0.125 2023-11-18 21:17:23,592 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1900, loss[loss=0.1293, simple_loss=0.1546, pruned_loss=0.04161, audio_tagging_loss=0.01043, over 16911.00 frames. ], tot_loss[loss=0.0983, simple_loss=0.1142, pruned_loss=0.03001, audio_tagging_loss=0.01118, over 3053177.77 frames. ], batch size: 63, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:17:31,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=413446.6666666667, ans=0.0 2023-11-18 21:17:57,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=413646.6666666667, ans=0.02 2023-11-18 21:17:58,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 9.160e+01 9.941e+01 1.091e+02 1.656e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 21:18:09,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=413713.3333333333, ans=0.0 2023-11-18 21:18:13,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=413713.3333333333, ans=0.0 2023-11-18 21:18:13,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=413713.3333333333, ans=0.1 2023-11-18 21:18:16,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=413713.3333333333, ans=0.125 2023-11-18 21:18:17,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=413713.3333333333, ans=0.0 2023-11-18 21:18:19,655 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 1950, loss[loss=0.1104, simple_loss=0.1349, pruned_loss=0.03201, audio_tagging_loss=0.01091, over 15802.00 frames. ], tot_loss[loss=0.097, simple_loss=0.1126, pruned_loss=0.02951, audio_tagging_loss=0.01117, over 3051041.62 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:18:35,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=413846.6666666667, ans=0.1 2023-11-18 21:18:51,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=413913.3333333333, ans=0.07 2023-11-18 21:18:59,909 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.26 vs. limit=10.0 2023-11-18 21:19:02,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=413980.0, ans=0.125 2023-11-18 21:19:15,973 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2000, loss[loss=0.06907, simple_loss=0.07276, pruned_loss=0.01965, audio_tagging_loss=0.01304, over 15535.00 frames. ], tot_loss[loss=0.09639, simple_loss=0.1117, pruned_loss=0.02927, audio_tagging_loss=0.01128, over 3044570.73 frames. ], batch size: 61, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:19:17,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=414113.3333333333, ans=0.125 2023-11-18 21:19:22,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=414113.3333333333, ans=0.125 2023-11-18 21:19:28,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=414180.0, ans=0.5 2023-11-18 21:19:34,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414180.0, ans=0.1 2023-11-18 21:19:44,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=414246.6666666667, ans=0.125 2023-11-18 21:19:49,199 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.46 vs. limit=15.0 2023-11-18 21:19:50,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.514e+01 8.645e+01 9.576e+01 1.020e+02 1.190e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-18 21:20:11,738 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2050, loss[loss=0.1234, simple_loss=0.153, pruned_loss=0.03879, audio_tagging_loss=0.008118, over 14988.00 frames. ], tot_loss[loss=0.09724, simple_loss=0.1125, pruned_loss=0.0297, audio_tagging_loss=0.01127, over 3038092.59 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:20:12,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=414446.6666666667, ans=0.125 2023-11-18 21:20:48,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-18 21:20:53,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=414646.6666666667, ans=0.1 2023-11-18 21:21:07,351 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2100, loss[loss=0.1205, simple_loss=0.1557, pruned_loss=0.03721, audio_tagging_loss=0.005406, over 15857.00 frames. ], tot_loss[loss=0.09798, simple_loss=0.1134, pruned_loss=0.03006, audio_tagging_loss=0.0112, over 3045321.15 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:21:13,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=414780.0, ans=0.0 2023-11-18 21:21:26,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=414846.6666666667, ans=0.125 2023-11-18 21:21:32,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=414913.3333333333, ans=0.09899494936611666 2023-11-18 21:21:42,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 9.113e+01 9.926e+01 1.128e+02 1.703e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-18 21:21:45,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=414980.0, ans=0.0 2023-11-18 21:22:03,829 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2150, loss[loss=0.1089, simple_loss=0.1309, pruned_loss=0.03229, audio_tagging_loss=0.01115, over 14954.00 frames. ], tot_loss[loss=0.09743, simple_loss=0.1129, pruned_loss=0.02976, audio_tagging_loss=0.01123, over 3050027.81 frames. ], batch size: 53, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:22:36,326 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:22:37,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=415313.3333333333, ans=0.125 2023-11-18 21:22:59,974 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2200, loss[loss=0.128, simple_loss=0.1492, pruned_loss=0.04583, audio_tagging_loss=0.007539, over 15692.00 frames. ], tot_loss[loss=0.09763, simple_loss=0.1131, pruned_loss=0.02974, audio_tagging_loss=0.01135, over 3045806.02 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:23:04,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=415446.6666666667, ans=0.125 2023-11-18 21:23:06,649 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:23:11,418 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.45 vs. limit=15.0 2023-11-18 21:23:27,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=415580.0, ans=0.1 2023-11-18 21:23:34,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=415646.6666666667, ans=0.0 2023-11-18 21:23:35,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.875e+01 9.823e+01 1.124e+02 2.816e+02, threshold=1.965e+02, percent-clipped=1.0 2023-11-18 21:23:43,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=415713.3333333333, ans=0.025 2023-11-18 21:23:55,604 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2250, loss[loss=0.08455, simple_loss=0.09836, pruned_loss=0.0231, audio_tagging_loss=0.01227, over 15353.00 frames. ], tot_loss[loss=0.0975, simple_loss=0.1127, pruned_loss=0.02973, audio_tagging_loss=0.01139, over 3038290.49 frames. ], batch size: 60, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:24:21,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-11-18 21:24:39,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=416046.6666666667, ans=0.0 2023-11-18 21:24:48,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-11-18 21:24:51,861 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2300, loss[loss=0.1152, simple_loss=0.1311, pruned_loss=0.03939, audio_tagging_loss=0.01029, over 16949.00 frames. ], tot_loss[loss=0.09817, simple_loss=0.1136, pruned_loss=0.03001, audio_tagging_loss=0.01136, over 3044953.97 frames. ], batch size: 66, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:25:00,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-18 21:25:01,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-18 21:25:27,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.781e+01 9.557e+01 1.037e+02 1.979e+02, threshold=1.911e+02, percent-clipped=1.0 2023-11-18 21:25:39,376 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:25:40,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=416380.0, ans=0.125 2023-11-18 21:25:44,724 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-11-18 21:25:47,864 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2350, loss[loss=0.09348, simple_loss=0.1086, pruned_loss=0.02678, audio_tagging_loss=0.01241, over 16465.00 frames. ], tot_loss[loss=0.0974, simple_loss=0.113, pruned_loss=0.02958, audio_tagging_loss=0.01135, over 3035831.53 frames. ], batch size: 61, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:26:31,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-11-18 21:26:43,288 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2400, loss[loss=0.09134, simple_loss=0.1075, pruned_loss=0.02772, audio_tagging_loss=0.009883, over 15409.00 frames. ], tot_loss[loss=0.09786, simple_loss=0.1134, pruned_loss=0.0297, audio_tagging_loss=0.01146, over 3038379.78 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:26:43,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.56 vs. limit=22.5 2023-11-18 21:26:51,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=416780.0, ans=0.0 2023-11-18 21:27:12,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=416913.3333333333, ans=0.125 2023-11-18 21:27:20,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.680e+01 9.662e+01 1.129e+02 1.566e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-18 21:27:27,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=417046.6666666667, ans=0.125 2023-11-18 21:27:39,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=417113.3333333333, ans=22.5 2023-11-18 21:27:39,646 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2450, loss[loss=0.06724, simple_loss=0.06795, pruned_loss=0.01691, audio_tagging_loss=0.01636, over 14014.00 frames. ], tot_loss[loss=0.09791, simple_loss=0.1133, pruned_loss=0.02964, audio_tagging_loss=0.0116, over 3039412.67 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:27:42,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2023-11-18 21:27:55,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2023-11-18 21:28:15,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=417313.3333333333, ans=0.04949747468305833 2023-11-18 21:28:16,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=417313.3333333333, ans=0.0 2023-11-18 21:28:21,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417313.3333333333, ans=0.1 2023-11-18 21:28:25,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=417380.0, ans=0.125 2023-11-18 21:28:35,698 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2500, loss[loss=0.08355, simple_loss=0.1043, pruned_loss=0.01887, audio_tagging_loss=0.01254, over 15900.00 frames. ], tot_loss[loss=0.09831, simple_loss=0.1138, pruned_loss=0.02977, audio_tagging_loss=0.01164, over 3039150.83 frames. ], batch size: 60, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:28:42,806 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:29:00,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=417580.0, ans=0.2 2023-11-18 21:29:02,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.55 vs. limit=15.0 2023-11-18 21:29:05,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417580.0, ans=0.1 2023-11-18 21:29:07,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=417580.0, ans=0.125 2023-11-18 21:29:12,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.909e+01 1.012e+02 1.108e+02 1.409e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 21:29:17,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=417646.6666666667, ans=0.2 2023-11-18 21:29:31,818 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2550, loss[loss=0.09077, simple_loss=0.09896, pruned_loss=0.02752, audio_tagging_loss=0.01377, over 14142.00 frames. ], tot_loss[loss=0.09858, simple_loss=0.1144, pruned_loss=0.03009, audio_tagging_loss=0.01131, over 3037070.24 frames. ], batch size: 52, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:30:08,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=417980.0, ans=0.125 2023-11-18 21:30:15,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=418046.6666666667, ans=0.125 2023-11-18 21:30:17,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=418046.6666666667, ans=0.0 2023-11-18 21:30:27,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418113.3333333333, ans=0.1 2023-11-18 21:30:28,046 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2600, loss[loss=0.1111, simple_loss=0.1372, pruned_loss=0.03404, audio_tagging_loss=0.008488, over 17257.00 frames. ], tot_loss[loss=0.09881, simple_loss=0.1148, pruned_loss=0.03021, audio_tagging_loss=0.01118, over 3040019.99 frames. ], batch size: 63, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:30:29,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=418113.3333333333, ans=0.125 2023-11-18 21:30:38,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-18 21:30:48,007 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:31:04,867 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.916e+01 1.001e+02 1.139e+02 1.588e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-18 21:31:16,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418380.0, ans=0.1 2023-11-18 21:31:24,010 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2650, loss[loss=0.1241, simple_loss=0.1459, pruned_loss=0.0395, audio_tagging_loss=0.01171, over 15446.00 frames. ], tot_loss[loss=0.09938, simple_loss=0.116, pruned_loss=0.03041, audio_tagging_loss=0.01098, over 3046081.96 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:31:35,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=418513.3333333333, ans=0.95 2023-11-18 21:32:14,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=418713.3333333333, ans=0.0 2023-11-18 21:32:19,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.79 vs. limit=10.0 2023-11-18 21:32:19,686 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2700, loss[loss=0.08185, simple_loss=0.09328, pruned_loss=0.02246, audio_tagging_loss=0.01275, over 16020.00 frames. ], tot_loss[loss=0.09952, simple_loss=0.116, pruned_loss=0.03054, audio_tagging_loss=0.01096, over 3048725.57 frames. ], batch size: 60, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:32:25,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=418780.0, ans=0.0 2023-11-18 21:32:53,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=418980.0, ans=0.2 2023-11-18 21:32:56,912 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.943e+01 9.076e+01 9.942e+01 1.068e+02 1.459e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 21:33:16,779 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2750, loss[loss=0.09791, simple_loss=0.1094, pruned_loss=0.02935, audio_tagging_loss=0.01387, over 15545.00 frames. ], tot_loss[loss=0.09967, simple_loss=0.1161, pruned_loss=0.03058, audio_tagging_loss=0.01104, over 3057556.66 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:33:33,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=419180.0, ans=0.0 2023-11-18 21:33:34,769 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=12.0 2023-11-18 21:34:03,599 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:34:06,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=419380.0, ans=0.2 2023-11-18 21:34:12,565 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2800, loss[loss=0.09698, simple_loss=0.1152, pruned_loss=0.03218, audio_tagging_loss=0.007217, over 15144.00 frames. ], tot_loss[loss=0.09814, simple_loss=0.1141, pruned_loss=0.02999, audio_tagging_loss=0.0111, over 3052882.16 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:34:19,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=419446.6666666667, ans=0.0 2023-11-18 21:34:34,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=419580.0, ans=0.0 2023-11-18 21:34:49,209 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.663e+01 8.954e+01 9.859e+01 1.088e+02 1.629e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 21:34:50,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=419646.6666666667, ans=0.125 2023-11-18 21:35:07,685 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2850, loss[loss=0.1106, simple_loss=0.1392, pruned_loss=0.03395, audio_tagging_loss=0.007039, over 16119.00 frames. ], tot_loss[loss=0.0976, simple_loss=0.1136, pruned_loss=0.0297, audio_tagging_loss=0.01109, over 3044112.59 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:35:10,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419780.0, ans=0.125 2023-11-18 21:35:10,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-18 21:35:11,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=419780.0, ans=0.0 2023-11-18 21:35:13,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=419780.0, ans=0.125 2023-11-18 21:35:14,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=419780.0, ans=0.125 2023-11-18 21:35:20,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.06 vs. limit=10.0 2023-11-18 21:35:58,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=420046.6666666667, ans=0.125 2023-11-18 21:36:05,316 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2900, loss[loss=0.0902, simple_loss=0.1069, pruned_loss=0.02603, audio_tagging_loss=0.01072, over 15302.00 frames. ], tot_loss[loss=0.09756, simple_loss=0.1134, pruned_loss=0.02972, audio_tagging_loss=0.01114, over 3040012.82 frames. ], batch size: 61, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:36:09,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=420113.3333333333, ans=0.0 2023-11-18 21:36:10,808 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:36:36,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-11-18 21:36:40,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.758e+01 9.574e+01 1.055e+02 1.297e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-18 21:36:41,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=420313.3333333333, ans=0.1 2023-11-18 21:36:50,717 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=420380.0, ans=0.1 2023-11-18 21:36:56,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2023-11-18 21:37:00,109 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 2950, loss[loss=0.1053, simple_loss=0.1258, pruned_loss=0.03154, audio_tagging_loss=0.0109, over 14427.00 frames. ], tot_loss[loss=0.09802, simple_loss=0.1139, pruned_loss=0.02994, audio_tagging_loss=0.01115, over 3044899.67 frames. ], batch size: 54, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:37:20,540 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2023-11-18 21:37:21,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=420580.0, ans=0.125 2023-11-18 21:37:29,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420580.0, ans=0.1 2023-11-18 21:37:29,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=420580.0, ans=0.0 2023-11-18 21:37:50,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2023-11-18 21:37:55,424 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3000, loss[loss=0.1076, simple_loss=0.1336, pruned_loss=0.03097, audio_tagging_loss=0.009888, over 15290.00 frames. ], tot_loss[loss=0.09806, simple_loss=0.1141, pruned_loss=0.02993, audio_tagging_loss=0.01108, over 3045883.54 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:37:55,425 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 21:38:24,960 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9534, 2.6113, 2.9724, 3.5016, 3.2715, 2.6018, 3.0445, 3.1094], device='cuda:3') 2023-11-18 21:38:28,439 INFO [train_asr.py:1147] (3/4) Epoch 6, validation: loss=0.07003, simple_loss=0.05914, pruned_loss=0.008279, audio_tagging_loss=0.03218, over 4681554.00 frames. 2023-11-18 21:38:28,440 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 21:38:41,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2023-11-18 21:39:02,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=420980.0, ans=0.125 2023-11-18 21:39:03,792 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.190e+01 1.009e+02 1.131e+02 1.432e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 21:39:13,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=421046.6666666667, ans=0.0 2023-11-18 21:39:16,088 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-11-18 21:39:22,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=421113.3333333333, ans=0.125 2023-11-18 21:39:23,545 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3050, loss[loss=0.09156, simple_loss=0.1094, pruned_loss=0.02664, audio_tagging_loss=0.01024, over 15395.00 frames. ], tot_loss[loss=0.09819, simple_loss=0.114, pruned_loss=0.03006, audio_tagging_loss=0.01111, over 3049170.54 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:39:55,079 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:40:19,081 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3100, loss[loss=0.07904, simple_loss=0.08874, pruned_loss=0.0236, audio_tagging_loss=0.01107, over 16509.00 frames. ], tot_loss[loss=0.09812, simple_loss=0.1138, pruned_loss=0.02997, audio_tagging_loss=0.01126, over 3046351.00 frames. ], batch size: 62, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:40:50,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2023-11-18 21:40:55,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.959e+01 9.848e+01 1.091e+02 1.372e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 21:40:55,879 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:41:06,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=421713.3333333333, ans=0.125 2023-11-18 21:41:14,376 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3150, loss[loss=0.1338, simple_loss=0.1532, pruned_loss=0.04648, audio_tagging_loss=0.01067, over 15682.00 frames. ], tot_loss[loss=0.0979, simple_loss=0.1137, pruned_loss=0.02984, audio_tagging_loss=0.01121, over 3043366.01 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:41:46,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=421980.0, ans=0.0 2023-11-18 21:42:00,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=422046.6666666667, ans=0.125 2023-11-18 21:42:03,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=422046.6666666667, ans=0.0 2023-11-18 21:42:06,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=422046.6666666667, ans=0.0 2023-11-18 21:42:08,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.17 vs. limit=6.0 2023-11-18 21:42:10,393 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3200, loss[loss=0.1502, simple_loss=0.1727, pruned_loss=0.0561, audio_tagging_loss=0.007739, over 14993.00 frames. ], tot_loss[loss=0.09855, simple_loss=0.1145, pruned_loss=0.03006, audio_tagging_loss=0.01124, over 3044552.68 frames. ], batch size: 54, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:42:14,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422113.3333333333, ans=0.1 2023-11-18 21:42:18,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.69 vs. limit=22.5 2023-11-18 21:42:46,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.629e+01 9.130e+01 9.753e+01 1.111e+02 1.645e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-18 21:43:03,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=15.0 2023-11-18 21:43:05,605 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3250, loss[loss=0.09044, simple_loss=0.1098, pruned_loss=0.02504, audio_tagging_loss=0.0105, over 15859.00 frames. ], tot_loss[loss=0.09736, simple_loss=0.1131, pruned_loss=0.02946, audio_tagging_loss=0.01136, over 3045284.61 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:43:12,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=422446.6666666667, ans=0.0 2023-11-18 21:44:01,574 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3300, loss[loss=0.1239, simple_loss=0.1425, pruned_loss=0.04178, audio_tagging_loss=0.0109, over 15675.00 frames. ], tot_loss[loss=0.09764, simple_loss=0.1136, pruned_loss=0.02939, audio_tagging_loss=0.01145, over 3049147.74 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:44:07,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=422780.0, ans=0.0 2023-11-18 21:44:10,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=422780.0, ans=0.1 2023-11-18 21:44:21,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.75 vs. limit=22.5 2023-11-18 21:44:25,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=422913.3333333333, ans=0.0 2023-11-18 21:44:34,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=422980.0, ans=0.125 2023-11-18 21:44:38,005 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 9.126e+01 1.034e+02 1.155e+02 1.977e+02, threshold=2.069e+02, percent-clipped=1.0 2023-11-18 21:44:54,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=423046.6666666667, ans=0.125 2023-11-18 21:44:57,143 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3350, loss[loss=0.0969, simple_loss=0.1252, pruned_loss=0.02467, audio_tagging_loss=0.009628, over 15274.00 frames. ], tot_loss[loss=0.09725, simple_loss=0.1129, pruned_loss=0.02938, audio_tagging_loss=0.01142, over 3050087.79 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:45:00,742 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.46 vs. limit=8.0 2023-11-18 21:45:06,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=423113.3333333333, ans=0.1 2023-11-18 21:45:10,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=423180.0, ans=0.0 2023-11-18 21:45:11,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=423180.0, ans=0.1 2023-11-18 21:45:32,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=423313.3333333333, ans=0.2 2023-11-18 21:45:48,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=423380.0, ans=0.0 2023-11-18 21:45:52,878 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3400, loss[loss=0.1068, simple_loss=0.1289, pruned_loss=0.0347, audio_tagging_loss=0.007695, over 15759.00 frames. ], tot_loss[loss=0.09771, simple_loss=0.1136, pruned_loss=0.02965, audio_tagging_loss=0.01124, over 3050035.25 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:46:02,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=423513.3333333333, ans=0.1 2023-11-18 21:46:13,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=423513.3333333333, ans=0.125 2023-11-18 21:46:16,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=423580.0, ans=0.2 2023-11-18 21:46:16,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=423580.0, ans=0.09899494936611666 2023-11-18 21:46:29,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.844e+01 9.699e+01 1.055e+02 1.387e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-18 21:46:33,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423646.6666666667, ans=0.1 2023-11-18 21:46:39,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=423713.3333333333, ans=0.125 2023-11-18 21:46:41,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=423713.3333333333, ans=0.0 2023-11-18 21:46:46,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2023-11-18 21:46:47,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=423780.0, ans=0.125 2023-11-18 21:46:47,941 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3450, loss[loss=0.07268, simple_loss=0.08497, pruned_loss=0.02123, audio_tagging_loss=0.008968, over 14032.00 frames. ], tot_loss[loss=0.09817, simple_loss=0.1142, pruned_loss=0.02985, audio_tagging_loss=0.01121, over 3048203.90 frames. ], batch size: 55, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:46:54,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=423780.0, ans=0.125 2023-11-18 21:47:15,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=423913.3333333333, ans=0.125 2023-11-18 21:47:23,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=423980.0, ans=0.0 2023-11-18 21:47:44,724 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3500, loss[loss=0.1175, simple_loss=0.1443, pruned_loss=0.03708, audio_tagging_loss=0.00829, over 15676.00 frames. ], tot_loss[loss=0.09805, simple_loss=0.1144, pruned_loss=0.02979, audio_tagging_loss=0.01107, over 3044485.31 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:47:44,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=424113.3333333333, ans=0.09899494936611666 2023-11-18 21:47:56,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=424180.0, ans=0.125 2023-11-18 21:48:07,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=12.0 2023-11-18 21:48:08,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=424246.6666666667, ans=0.2 2023-11-18 21:48:11,272 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:48:17,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=424313.3333333333, ans=0.0 2023-11-18 21:48:21,880 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.190e+01 1.062e+02 1.231e+02 1.599e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 21:48:33,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=424380.0, ans=0.125 2023-11-18 21:48:40,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=424446.6666666667, ans=0.125 2023-11-18 21:48:41,130 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3550, loss[loss=0.08529, simple_loss=0.09931, pruned_loss=0.02391, audio_tagging_loss=0.01173, over 15139.00 frames. ], tot_loss[loss=0.09703, simple_loss=0.113, pruned_loss=0.02958, audio_tagging_loss=0.01093, over 3035873.56 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:48:43,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.71 vs. limit=22.5 2023-11-18 21:48:49,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=424446.6666666667, ans=0.2 2023-11-18 21:48:51,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=424513.3333333333, ans=0.2 2023-11-18 21:49:00,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=22.5 2023-11-18 21:49:05,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424580.0, ans=0.1 2023-11-18 21:49:11,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=424580.0, ans=0.125 2023-11-18 21:49:23,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2023-11-18 21:49:30,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=424713.3333333333, ans=0.2 2023-11-18 21:49:30,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-11-18 21:49:32,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=424713.3333333333, ans=0.125 2023-11-18 21:49:36,613 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3600, loss[loss=0.08199, simple_loss=0.09917, pruned_loss=0.02382, audio_tagging_loss=0.008583, over 14255.00 frames. ], tot_loss[loss=0.09757, simple_loss=0.1137, pruned_loss=0.02981, audio_tagging_loss=0.0109, over 3042767.28 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:49:58,556 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=424913.3333333333, ans=0.125 2023-11-18 21:50:13,743 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.290e+01 1.027e+02 1.176e+02 1.572e+02, threshold=2.055e+02, percent-clipped=0.0 2023-11-18 21:50:13,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=424980.0, ans=0.0 2023-11-18 21:50:14,398 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2023-11-18 21:50:18,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424980.0, ans=0.1 2023-11-18 21:50:32,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=425113.3333333333, ans=0.125 2023-11-18 21:50:33,112 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3650, loss[loss=0.09408, simple_loss=0.1179, pruned_loss=0.02454, audio_tagging_loss=0.01058, over 15394.00 frames. ], tot_loss[loss=0.09682, simple_loss=0.1127, pruned_loss=0.0296, audio_tagging_loss=0.01085, over 3041375.07 frames. ], batch size: 59, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:50:46,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-18 21:50:49,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-18 21:51:02,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=425246.6666666667, ans=0.125 2023-11-18 21:51:10,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=425313.3333333333, ans=0.0 2023-11-18 21:51:22,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=425380.0, ans=0.015 2023-11-18 21:51:25,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=425380.0, ans=0.0 2023-11-18 21:51:29,198 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3700, loss[loss=0.06001, simple_loss=0.06773, pruned_loss=0.0146, audio_tagging_loss=0.01154, over 15365.00 frames. ], tot_loss[loss=0.09818, simple_loss=0.1142, pruned_loss=0.03013, audio_tagging_loss=0.01096, over 3050091.52 frames. ], batch size: 60, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:51:43,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425513.3333333333, ans=0.1 2023-11-18 21:51:46,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=425513.3333333333, ans=0.015 2023-11-18 21:52:01,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=425646.6666666667, ans=0.1 2023-11-18 21:52:06,434 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.991e+01 9.791e+01 1.095e+02 1.443e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-18 21:52:10,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=425646.6666666667, ans=0.1 2023-11-18 21:52:25,161 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3750, loss[loss=0.07147, simple_loss=0.06819, pruned_loss=0.01921, audio_tagging_loss=0.01816, over 14511.00 frames. ], tot_loss[loss=0.09841, simple_loss=0.1146, pruned_loss=0.03012, audio_tagging_loss=0.01099, over 3046240.76 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:52:32,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=425780.0, ans=0.0 2023-11-18 21:52:41,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=425846.6666666667, ans=0.125 2023-11-18 21:52:44,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=425846.6666666667, ans=0.125 2023-11-18 21:53:02,237 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:53:03,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=425980.0, ans=0.125 2023-11-18 21:53:13,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=426046.6666666667, ans=0.0 2023-11-18 21:53:21,307 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3800, loss[loss=0.09735, simple_loss=0.1156, pruned_loss=0.02762, audio_tagging_loss=0.01191, over 14703.00 frames. ], tot_loss[loss=0.09976, simple_loss=0.1163, pruned_loss=0.03058, audio_tagging_loss=0.01104, over 3050467.18 frames. ], batch size: 52, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:53:56,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=426313.3333333333, ans=0.0 2023-11-18 21:53:57,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=426313.3333333333, ans=0.0 2023-11-18 21:53:57,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=426313.3333333333, ans=0.1 2023-11-18 21:53:57,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.759e+01 9.502e+01 1.058e+02 1.503e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-18 21:54:16,855 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3850, loss[loss=0.07734, simple_loss=0.08328, pruned_loss=0.02027, audio_tagging_loss=0.01542, over 14786.00 frames. ], tot_loss[loss=0.09932, simple_loss=0.1155, pruned_loss=0.03034, audio_tagging_loss=0.01124, over 3042883.83 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:54:23,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=426446.6666666667, ans=6.0 2023-11-18 21:54:25,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=426446.6666666667, ans=0.125 2023-11-18 21:54:44,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=426580.0, ans=0.2 2023-11-18 21:55:14,857 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3900, loss[loss=0.131, simple_loss=0.1569, pruned_loss=0.04339, audio_tagging_loss=0.00914, over 15625.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1167, pruned_loss=0.03078, audio_tagging_loss=0.01132, over 3047899.58 frames. ], batch size: 54, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:55:31,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.80 vs. limit=12.0 2023-11-18 21:55:45,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=426913.3333333333, ans=0.04949747468305833 2023-11-18 21:55:46,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=426913.3333333333, ans=0.0 2023-11-18 21:55:51,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 9.434e+01 1.040e+02 1.132e+02 1.500e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 21:55:57,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=426980.0, ans=0.2 2023-11-18 21:56:10,953 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 3950, loss[loss=0.1159, simple_loss=0.1257, pruned_loss=0.04431, audio_tagging_loss=0.008733, over 15226.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1165, pruned_loss=0.03069, audio_tagging_loss=0.01144, over 3043355.92 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:56:14,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=427113.3333333333, ans=0.95 2023-11-18 21:56:15,418 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427113.3333333333, ans=0.1 2023-11-18 21:56:17,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=427113.3333333333, ans=0.125 2023-11-18 21:56:34,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2023-11-18 21:56:46,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=427313.3333333333, ans=0.0 2023-11-18 21:56:54,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=427313.3333333333, ans=0.05 2023-11-18 21:57:07,300 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4000, loss[loss=0.1088, simple_loss=0.1291, pruned_loss=0.03606, audio_tagging_loss=0.008218, over 14302.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.1172, pruned_loss=0.0308, audio_tagging_loss=0.01146, over 3039200.15 frames. ], batch size: 53, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:57:15,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427446.6666666667, ans=0.1 2023-11-18 21:57:24,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=427513.3333333333, ans=0.125 2023-11-18 21:57:27,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=427513.3333333333, ans=0.0 2023-11-18 21:57:43,817 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.662e+01 9.312e+01 1.008e+02 1.147e+02 1.511e+02, threshold=2.016e+02, percent-clipped=0.0 2023-11-18 21:57:44,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=427646.6666666667, ans=0.125 2023-11-18 21:57:47,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=427646.6666666667, ans=0.125 2023-11-18 21:57:49,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2023-11-18 21:58:02,415 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4050, loss[loss=0.1162, simple_loss=0.1376, pruned_loss=0.03529, audio_tagging_loss=0.01209, over 15206.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1175, pruned_loss=0.03065, audio_tagging_loss=0.01137, over 3044584.34 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:58:03,494 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:58:26,621 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=427913.3333333333, ans=0.0 2023-11-18 21:58:56,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=428046.6666666667, ans=0.0 2023-11-18 21:58:59,665 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4100, loss[loss=0.08693, simple_loss=0.09746, pruned_loss=0.02512, audio_tagging_loss=0.01308, over 14424.00 frames. ], tot_loss[loss=0.1, simple_loss=0.1168, pruned_loss=0.03033, audio_tagging_loss=0.01126, over 3048389.30 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:59:17,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=428180.0, ans=0.025 2023-11-18 21:59:23,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=428246.6666666667, ans=0.125 2023-11-18 21:59:35,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.951e+01 9.681e+01 1.090e+02 3.452e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-18 21:59:50,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=428380.0, ans=0.0 2023-11-18 21:59:55,354 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4150, loss[loss=0.09342, simple_loss=0.1055, pruned_loss=0.02953, audio_tagging_loss=0.01116, over 15445.00 frames. ], tot_loss[loss=0.0984, simple_loss=0.1147, pruned_loss=0.02979, audio_tagging_loss=0.01128, over 3043840.19 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:00:34,704 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:00:40,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=428713.3333333333, ans=0.0 2023-11-18 22:00:46,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.38 vs. limit=22.5 2023-11-18 22:00:50,539 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4200, loss[loss=0.1034, simple_loss=0.1235, pruned_loss=0.02731, audio_tagging_loss=0.01435, over 16129.00 frames. ], tot_loss[loss=0.09785, simple_loss=0.1144, pruned_loss=0.02959, audio_tagging_loss=0.01106, over 3039843.25 frames. ], batch size: 61, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:01:06,542 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2023-11-18 22:01:09,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=428846.6666666667, ans=0.125 2023-11-18 22:01:27,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.609e+01 9.394e+01 1.081e+02 1.374e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-18 22:01:33,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=428980.0, ans=0.2 2023-11-18 22:01:41,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=429046.6666666667, ans=0.2 2023-11-18 22:01:43,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=429046.6666666667, ans=0.95 2023-11-18 22:01:44,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=12.0 2023-11-18 22:01:46,225 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4250, loss[loss=0.08915, simple_loss=0.1117, pruned_loss=0.02289, audio_tagging_loss=0.0104, over 15680.00 frames. ], tot_loss[loss=0.09848, simple_loss=0.1152, pruned_loss=0.02982, audio_tagging_loss=0.01104, over 3044096.28 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:02:13,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429246.6666666667, ans=0.1 2023-11-18 22:02:19,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429313.3333333333, ans=0.1 2023-11-18 22:02:20,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=429313.3333333333, ans=0.125 2023-11-18 22:02:39,983 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.14 vs. limit=22.5 2023-11-18 22:02:43,270 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4300, loss[loss=0.08844, simple_loss=0.1043, pruned_loss=0.02495, audio_tagging_loss=0.01133, over 15940.00 frames. ], tot_loss[loss=0.0989, simple_loss=0.1159, pruned_loss=0.02995, audio_tagging_loss=0.01099, over 3040436.36 frames. ], batch size: 63, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:02:45,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=429446.6666666667, ans=0.125 2023-11-18 22:02:53,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=429513.3333333333, ans=0.2 2023-11-18 22:03:05,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=429580.0, ans=0.125 2023-11-18 22:03:20,036 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 9.239e+01 1.003e+02 1.122e+02 1.597e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-18 22:03:38,790 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4350, loss[loss=0.07515, simple_loss=0.08994, pruned_loss=0.02017, audio_tagging_loss=0.01001, over 15675.00 frames. ], tot_loss[loss=0.09832, simple_loss=0.1149, pruned_loss=0.02987, audio_tagging_loss=0.01099, over 3038633.38 frames. ], batch size: 59, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:03:45,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=429780.0, ans=0.125 2023-11-18 22:04:04,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=429913.3333333333, ans=0.1 2023-11-18 22:04:09,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=429913.3333333333, ans=0.0 2023-11-18 22:04:34,584 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4400, loss[loss=0.1137, simple_loss=0.1342, pruned_loss=0.03647, audio_tagging_loss=0.01008, over 16799.00 frames. ], tot_loss[loss=0.09724, simple_loss=0.1136, pruned_loss=0.02945, audio_tagging_loss=0.01097, over 3035331.55 frames. ], batch size: 63, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:04:40,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=430113.3333333333, ans=0.0 2023-11-18 22:04:56,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=430246.6666666667, ans=0.025 2023-11-18 22:04:59,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=430246.6666666667, ans=0.0 2023-11-18 22:05:08,442 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=430313.3333333333, ans=0.125 2023-11-18 22:05:11,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.875e+01 9.886e+01 1.073e+02 1.418e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-18 22:05:13,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=430313.3333333333, ans=0.125 2023-11-18 22:05:18,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=430380.0, ans=0.2 2023-11-18 22:05:29,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=430380.0, ans=0.05 2023-11-18 22:05:31,700 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4450, loss[loss=0.1061, simple_loss=0.1341, pruned_loss=0.03067, audio_tagging_loss=0.0084, over 14791.00 frames. ], tot_loss[loss=0.09647, simple_loss=0.1128, pruned_loss=0.02909, audio_tagging_loss=0.01097, over 3027892.52 frames. ], batch size: 55, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:05:46,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2023-11-18 22:05:51,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=430513.3333333333, ans=0.0 2023-11-18 22:05:57,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430580.0, ans=0.1 2023-11-18 22:05:58,239 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.50 vs. limit=15.0 2023-11-18 22:06:01,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=430580.0, ans=0.04949747468305833 2023-11-18 22:06:15,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=430713.3333333333, ans=0.125 2023-11-18 22:06:21,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430713.3333333333, ans=0.1 2023-11-18 22:06:26,826 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4500, loss[loss=0.0965, simple_loss=0.11, pruned_loss=0.02813, audio_tagging_loss=0.01338, over 14262.00 frames. ], tot_loss[loss=0.09724, simple_loss=0.1135, pruned_loss=0.02946, audio_tagging_loss=0.01103, over 3031994.39 frames. ], batch size: 55, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:06:28,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=430780.0, ans=0.125 2023-11-18 22:06:34,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=430780.0, ans=0.125 2023-11-18 22:06:36,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2023-11-18 22:06:47,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=430846.6666666667, ans=0.2 2023-11-18 22:07:03,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 9.076e+01 9.936e+01 1.124e+02 1.630e+02, threshold=1.987e+02, percent-clipped=0.0 2023-11-18 22:07:10,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=431046.6666666667, ans=0.125 2023-11-18 22:07:16,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=6.0 2023-11-18 22:07:19,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2023-11-18 22:07:22,445 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4550, loss[loss=0.1175, simple_loss=0.1489, pruned_loss=0.03344, audio_tagging_loss=0.009639, over 15900.00 frames. ], tot_loss[loss=0.09715, simple_loss=0.1134, pruned_loss=0.02939, audio_tagging_loss=0.01106, over 3039216.95 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:08:02,185 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:08:08,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=431380.0, ans=0.0 2023-11-18 22:08:09,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=431380.0, ans=0.125 2023-11-18 22:08:18,465 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4600, loss[loss=0.1114, simple_loss=0.1304, pruned_loss=0.03275, audio_tagging_loss=0.01347, over 15563.00 frames. ], tot_loss[loss=0.09609, simple_loss=0.1118, pruned_loss=0.02901, audio_tagging_loss=0.01117, over 3039800.42 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:08:29,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=431513.3333333333, ans=0.0 2023-11-18 22:08:32,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=431513.3333333333, ans=0.1 2023-11-18 22:08:39,504 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:08:49,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431580.0, ans=0.1 2023-11-18 22:08:55,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.959e+01 9.865e+01 1.112e+02 1.512e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 22:08:58,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=431646.6666666667, ans=0.125 2023-11-18 22:09:14,276 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4650, loss[loss=0.09192, simple_loss=0.1146, pruned_loss=0.02433, audio_tagging_loss=0.01028, over 15336.00 frames. ], tot_loss[loss=0.09558, simple_loss=0.1109, pruned_loss=0.0288, audio_tagging_loss=0.01134, over 3042691.59 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:09:14,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=431780.0, ans=0.125 2023-11-18 22:09:39,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=431913.3333333333, ans=0.125 2023-11-18 22:10:03,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=432046.6666666667, ans=0.04949747468305833 2023-11-18 22:10:05,166 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=22.5 2023-11-18 22:10:08,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=432113.3333333333, ans=0.125 2023-11-18 22:10:09,896 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4700, loss[loss=0.07295, simple_loss=0.07967, pruned_loss=0.0227, audio_tagging_loss=0.01042, over 16266.00 frames. ], tot_loss[loss=0.09638, simple_loss=0.1119, pruned_loss=0.02907, audio_tagging_loss=0.01137, over 3045153.73 frames. ], batch size: 65, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:10:10,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=432113.3333333333, ans=0.125 2023-11-18 22:10:14,740 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=432113.3333333333, ans=0.035 2023-11-18 22:10:17,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=432113.3333333333, ans=0.0 2023-11-18 22:10:23,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=432180.0, ans=0.2 2023-11-18 22:10:26,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=432180.0, ans=0.0 2023-11-18 22:10:46,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.81 vs. limit=15.0 2023-11-18 22:10:47,009 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.805e+01 9.825e+01 1.121e+02 1.529e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-18 22:10:56,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2023-11-18 22:11:04,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=432380.0, ans=0.125 2023-11-18 22:11:06,137 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4750, loss[loss=0.1203, simple_loss=0.1338, pruned_loss=0.04274, audio_tagging_loss=0.01068, over 14437.00 frames. ], tot_loss[loss=0.09658, simple_loss=0.1121, pruned_loss=0.02914, audio_tagging_loss=0.01138, over 3045975.61 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:11:08,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-18 22:11:14,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=432446.6666666667, ans=0.125 2023-11-18 22:11:16,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=432513.3333333333, ans=0.125 2023-11-18 22:11:18,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=432513.3333333333, ans=0.125 2023-11-18 22:11:59,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=432713.3333333333, ans=0.0 2023-11-18 22:12:02,253 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4800, loss[loss=0.1335, simple_loss=0.1591, pruned_loss=0.04285, audio_tagging_loss=0.0111, over 15031.00 frames. ], tot_loss[loss=0.09588, simple_loss=0.1112, pruned_loss=0.02873, audio_tagging_loss=0.01157, over 3043399.61 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:12:05,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=432780.0, ans=0.125 2023-11-18 22:12:16,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=432846.6666666667, ans=0.125 2023-11-18 22:12:19,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=432846.6666666667, ans=0.0 2023-11-18 22:12:30,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=432913.3333333333, ans=0.0 2023-11-18 22:12:35,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=432980.0, ans=0.0 2023-11-18 22:12:36,148 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.14 vs. limit=22.5 2023-11-18 22:12:39,893 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.787e+01 9.594e+01 1.036e+02 1.388e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-18 22:12:55,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=433046.6666666667, ans=0.0 2023-11-18 22:12:57,429 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4850, loss[loss=0.09342, simple_loss=0.1053, pruned_loss=0.02785, audio_tagging_loss=0.01291, over 14056.00 frames. ], tot_loss[loss=0.09712, simple_loss=0.1126, pruned_loss=0.02916, audio_tagging_loss=0.01165, over 3040728.34 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:13:09,590 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.54 vs. limit=15.0 2023-11-18 22:13:13,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=433180.0, ans=0.125 2023-11-18 22:13:36,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.20 vs. limit=22.5 2023-11-18 22:13:41,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=433380.0, ans=0.1 2023-11-18 22:13:44,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=433380.0, ans=0.125 2023-11-18 22:13:53,998 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4900, loss[loss=0.09483, simple_loss=0.1027, pruned_loss=0.03111, audio_tagging_loss=0.01235, over 14180.00 frames. ], tot_loss[loss=0.09709, simple_loss=0.113, pruned_loss=0.02912, audio_tagging_loss=0.01146, over 3036547.00 frames. ], batch size: 53, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:14:01,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=433446.6666666667, ans=0.125 2023-11-18 22:14:04,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=433513.3333333333, ans=0.125 2023-11-18 22:14:05,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.43 vs. limit=10.0 2023-11-18 22:14:14,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=433513.3333333333, ans=0.1 2023-11-18 22:14:20,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=433580.0, ans=0.125 2023-11-18 22:14:23,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=433580.0, ans=0.125 2023-11-18 22:14:32,279 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.759e+01 9.245e+01 1.025e+02 1.316e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-18 22:14:37,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=433713.3333333333, ans=0.1 2023-11-18 22:14:49,980 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 4950, loss[loss=0.09376, simple_loss=0.1031, pruned_loss=0.02957, audio_tagging_loss=0.01264, over 14891.00 frames. ], tot_loss[loss=0.09679, simple_loss=0.1128, pruned_loss=0.0291, audio_tagging_loss=0.01128, over 3025504.91 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:15:31,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-11-18 22:15:45,753 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5000, loss[loss=0.1079, simple_loss=0.1329, pruned_loss=0.03224, audio_tagging_loss=0.009218, over 15851.00 frames. ], tot_loss[loss=0.09704, simple_loss=0.1133, pruned_loss=0.02917, audio_tagging_loss=0.01123, over 3033464.07 frames. ], batch size: 60, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:15:49,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=434113.3333333333, ans=0.125 2023-11-18 22:15:49,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=434113.3333333333, ans=0.125 2023-11-18 22:16:07,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=434246.6666666667, ans=0.0 2023-11-18 22:16:19,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=434313.3333333333, ans=0.0 2023-11-18 22:16:23,435 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.790e+01 9.696e+01 1.074e+02 1.675e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-18 22:16:25,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.92 vs. limit=22.5 2023-11-18 22:16:30,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2023-11-18 22:16:41,998 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5050, loss[loss=0.1149, simple_loss=0.1367, pruned_loss=0.03451, audio_tagging_loss=0.01204, over 16208.00 frames. ], tot_loss[loss=0.09745, simple_loss=0.1144, pruned_loss=0.02922, audio_tagging_loss=0.01101, over 3037946.75 frames. ], batch size: 61, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:16:47,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=434446.6666666667, ans=0.0 2023-11-18 22:16:58,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=434513.3333333333, ans=0.07 2023-11-18 22:16:59,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434513.3333333333, ans=0.1 2023-11-18 22:17:14,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=434646.6666666667, ans=0.2 2023-11-18 22:17:29,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=434713.3333333333, ans=0.0 2023-11-18 22:17:38,165 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5100, loss[loss=0.08833, simple_loss=0.09884, pruned_loss=0.0241, audio_tagging_loss=0.01481, over 16131.00 frames. ], tot_loss[loss=0.09784, simple_loss=0.1149, pruned_loss=0.02946, audio_tagging_loss=0.01094, over 3037909.51 frames. ], batch size: 58, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:18:02,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.31 vs. limit=15.0 2023-11-18 22:18:16,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.735e+01 9.607e+01 1.051e+02 1.879e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-18 22:18:33,465 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5150, loss[loss=0.1065, simple_loss=0.1282, pruned_loss=0.03328, audio_tagging_loss=0.00915, over 14898.00 frames. ], tot_loss[loss=0.09726, simple_loss=0.114, pruned_loss=0.02938, audio_tagging_loss=0.01086, over 3045247.12 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:18:42,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=435113.3333333333, ans=0.0 2023-11-18 22:18:43,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435113.3333333333, ans=0.1 2023-11-18 22:18:51,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=435180.0, ans=0.125 2023-11-18 22:19:04,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.20 vs. limit=22.5 2023-11-18 22:19:16,600 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2023-11-18 22:19:30,435 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5200, loss[loss=0.1138, simple_loss=0.139, pruned_loss=0.03567, audio_tagging_loss=0.008626, over 15635.00 frames. ], tot_loss[loss=0.09774, simple_loss=0.1147, pruned_loss=0.02962, audio_tagging_loss=0.01077, over 3043937.07 frames. ], batch size: 61, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:19:34,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=435446.6666666667, ans=0.125 2023-11-18 22:19:43,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=435513.3333333333, ans=0.0 2023-11-18 22:19:50,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=435513.3333333333, ans=0.0 2023-11-18 22:20:04,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=435646.6666666667, ans=0.0 2023-11-18 22:20:09,690 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 8.948e+01 9.784e+01 1.083e+02 1.629e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-18 22:20:15,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.87 vs. limit=15.0 2023-11-18 22:20:17,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.07 vs. limit=15.0 2023-11-18 22:20:25,526 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5250, loss[loss=0.09628, simple_loss=0.1187, pruned_loss=0.0293, audio_tagging_loss=0.007623, over 14613.00 frames. ], tot_loss[loss=0.09812, simple_loss=0.1154, pruned_loss=0.02971, audio_tagging_loss=0.0107, over 3056249.41 frames. ], batch size: 54, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:20:25,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435780.0, ans=0.1 2023-11-18 22:20:35,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=435846.6666666667, ans=10.0 2023-11-18 22:20:42,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=435846.6666666667, ans=0.2 2023-11-18 22:20:52,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=435913.3333333333, ans=0.125 2023-11-18 22:20:57,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=435913.3333333333, ans=0.2 2023-11-18 22:20:59,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435980.0, ans=0.1 2023-11-18 22:21:12,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=436046.6666666667, ans=0.1 2023-11-18 22:21:20,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=436113.3333333333, ans=0.07 2023-11-18 22:21:21,102 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5300, loss[loss=0.08441, simple_loss=0.0958, pruned_loss=0.02735, audio_tagging_loss=0.009158, over 14412.00 frames. ], tot_loss[loss=0.09704, simple_loss=0.1139, pruned_loss=0.02932, audio_tagging_loss=0.01079, over 3053862.36 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:21:37,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.83 vs. limit=6.0 2023-11-18 22:21:53,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=436246.6666666667, ans=0.0 2023-11-18 22:21:55,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=436313.3333333333, ans=0.125 2023-11-18 22:22:01,696 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.659e+01 9.451e+01 1.050e+02 1.358e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-18 22:22:15,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=436380.0, ans=0.125 2023-11-18 22:22:17,478 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5350, loss[loss=0.1036, simple_loss=0.1228, pruned_loss=0.02962, audio_tagging_loss=0.01255, over 16050.00 frames. ], tot_loss[loss=0.09746, simple_loss=0.1144, pruned_loss=0.02948, audio_tagging_loss=0.01076, over 3053215.49 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:22:51,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=436646.6666666667, ans=0.125 2023-11-18 22:22:54,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=436646.6666666667, ans=0.125 2023-11-18 22:23:03,872 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.01 vs. limit=6.0 2023-11-18 22:23:13,374 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5400, loss[loss=0.09323, simple_loss=0.115, pruned_loss=0.02568, audio_tagging_loss=0.01005, over 15670.00 frames. ], tot_loss[loss=0.09817, simple_loss=0.1152, pruned_loss=0.02977, audio_tagging_loss=0.01082, over 3048845.12 frames. ], batch size: 58, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:23:19,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=436780.0, ans=0.09899494936611666 2023-11-18 22:23:26,335 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:23:50,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=436980.0, ans=0.015 2023-11-18 22:23:53,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.112e+01 1.017e+02 1.141e+02 1.585e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 22:23:55,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.88 vs. limit=12.0 2023-11-18 22:24:00,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.98 vs. limit=15.0 2023-11-18 22:24:02,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=437046.6666666667, ans=0.0 2023-11-18 22:24:08,343 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5450, loss[loss=0.09957, simple_loss=0.1131, pruned_loss=0.03101, audio_tagging_loss=0.01201, over 16175.00 frames. ], tot_loss[loss=0.09809, simple_loss=0.1148, pruned_loss=0.02983, audio_tagging_loss=0.01087, over 3042081.02 frames. ], batch size: 61, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:24:49,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=437313.3333333333, ans=0.0 2023-11-18 22:24:56,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=437380.0, ans=0.125 2023-11-18 22:24:58,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=437380.0, ans=0.125 2023-11-18 22:25:04,300 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5500, loss[loss=0.1157, simple_loss=0.1405, pruned_loss=0.03261, audio_tagging_loss=0.01285, over 15154.00 frames. ], tot_loss[loss=0.09811, simple_loss=0.1147, pruned_loss=0.02973, audio_tagging_loss=0.01104, over 3040107.82 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:25:05,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=437446.6666666667, ans=0.1 2023-11-18 22:25:11,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=437446.6666666667, ans=0.125 2023-11-18 22:25:32,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=437580.0, ans=0.0 2023-11-18 22:25:39,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=437646.6666666667, ans=0.0 2023-11-18 22:25:44,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.778e+01 9.490e+01 1.043e+02 1.354e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-18 22:26:00,858 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5550, loss[loss=0.1343, simple_loss=0.1563, pruned_loss=0.04537, audio_tagging_loss=0.01076, over 15555.00 frames. ], tot_loss[loss=0.09844, simple_loss=0.1148, pruned_loss=0.02983, audio_tagging_loss=0.01122, over 3051243.60 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:26:09,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=437780.0, ans=0.125 2023-11-18 22:26:10,826 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2023-11-18 22:26:23,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=437913.3333333333, ans=0.5 2023-11-18 22:26:33,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=437980.0, ans=0.0 2023-11-18 22:26:35,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2023-11-18 22:26:43,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=437980.0, ans=0.0 2023-11-18 22:26:51,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-11-18 22:26:55,695 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5600, loss[loss=0.07291, simple_loss=0.0757, pruned_loss=0.02151, audio_tagging_loss=0.01354, over 14767.00 frames. ], tot_loss[loss=0.09745, simple_loss=0.1135, pruned_loss=0.02936, audio_tagging_loss=0.01134, over 3049948.36 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:27:05,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=438180.0, ans=0.0 2023-11-18 22:27:11,248 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:27:25,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=438246.6666666667, ans=0.0 2023-11-18 22:27:34,841 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:27:35,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 9.319e+01 9.867e+01 1.109e+02 1.388e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 22:27:41,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438380.0, ans=0.1 2023-11-18 22:27:50,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2023-11-18 22:27:51,225 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5650, loss[loss=0.1094, simple_loss=0.1301, pruned_loss=0.03479, audio_tagging_loss=0.009548, over 15102.00 frames. ], tot_loss[loss=0.09803, simple_loss=0.1143, pruned_loss=0.02961, audio_tagging_loss=0.01129, over 3048960.50 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:27:58,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=438446.6666666667, ans=0.0 2023-11-18 22:28:01,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=438513.3333333333, ans=0.125 2023-11-18 22:28:09,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=438513.3333333333, ans=0.0 2023-11-18 22:28:14,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=438580.0, ans=0.125 2023-11-18 22:28:14,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=438580.0, ans=0.125 2023-11-18 22:28:26,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=438646.6666666667, ans=0.1 2023-11-18 22:28:32,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=438646.6666666667, ans=0.125 2023-11-18 22:28:37,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=438713.3333333333, ans=0.2 2023-11-18 22:28:42,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=438713.3333333333, ans=0.0 2023-11-18 22:28:45,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=438713.3333333333, ans=0.125 2023-11-18 22:28:46,919 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5700, loss[loss=0.1156, simple_loss=0.1314, pruned_loss=0.03939, audio_tagging_loss=0.01057, over 15294.00 frames. ], tot_loss[loss=0.09803, simple_loss=0.1145, pruned_loss=0.02953, audio_tagging_loss=0.01125, over 3053262.82 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:28:53,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=438780.0, ans=0.0 2023-11-18 22:29:27,092 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.890e+01 9.546e+01 1.053e+02 1.340e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-18 22:29:42,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=439113.3333333333, ans=0.125 2023-11-18 22:29:43,000 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5750, loss[loss=0.07813, simple_loss=0.09141, pruned_loss=0.02066, audio_tagging_loss=0.01177, over 14977.00 frames. ], tot_loss[loss=0.0975, simple_loss=0.1136, pruned_loss=0.02942, audio_tagging_loss=0.0113, over 3049210.40 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:29:47,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.58 vs. limit=15.0 2023-11-18 22:30:09,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=439246.6666666667, ans=0.125 2023-11-18 22:30:15,384 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-18 22:30:21,291 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2023-11-18 22:30:26,601 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-18 22:30:33,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=439380.0, ans=0.125 2023-11-18 22:30:37,464 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5800, loss[loss=0.09489, simple_loss=0.112, pruned_loss=0.02478, audio_tagging_loss=0.01408, over 14814.00 frames. ], tot_loss[loss=0.09718, simple_loss=0.113, pruned_loss=0.02949, audio_tagging_loss=0.01118, over 3048334.59 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:30:39,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=439446.6666666667, ans=0.0 2023-11-18 22:30:41,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=439446.6666666667, ans=0.125 2023-11-18 22:30:41,659 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-11-18 22:30:46,559 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.00 vs. limit=10.0 2023-11-18 22:30:50,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-18 22:30:57,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=439513.3333333333, ans=0.125 2023-11-18 22:30:58,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=439513.3333333333, ans=0.0 2023-11-18 22:31:17,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.421e+01 9.751e+01 1.061e+02 1.528e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-18 22:31:19,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=439646.6666666667, ans=0.125 2023-11-18 22:31:33,829 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5850, loss[loss=0.08364, simple_loss=0.1059, pruned_loss=0.01926, audio_tagging_loss=0.01145, over 14040.00 frames. ], tot_loss[loss=0.09686, simple_loss=0.1126, pruned_loss=0.0294, audio_tagging_loss=0.01113, over 3051475.16 frames. ], batch size: 53, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:31:42,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=439780.0, ans=0.0 2023-11-18 22:31:46,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=439846.6666666667, ans=0.125 2023-11-18 22:32:06,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=439980.0, ans=0.04949747468305833 2023-11-18 22:32:18,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-18 22:32:29,195 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-11-18 22:32:29,691 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5900, loss[loss=0.118, simple_loss=0.1368, pruned_loss=0.04202, audio_tagging_loss=0.00759, over 15773.00 frames. ], tot_loss[loss=0.0973, simple_loss=0.1136, pruned_loss=0.02942, audio_tagging_loss=0.01107, over 3047769.08 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:32:36,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=440113.3333333333, ans=0.125 2023-11-18 22:32:39,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=440180.0, ans=0.0 2023-11-18 22:32:49,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=440180.0, ans=0.035 2023-11-18 22:32:54,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.60 vs. limit=22.5 2023-11-18 22:33:00,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=440246.6666666667, ans=0.0 2023-11-18 22:33:09,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.910e+01 8.634e+01 9.404e+01 1.031e+02 1.635e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-18 22:33:24,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=440446.6666666667, ans=0.0 2023-11-18 22:33:24,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=440446.6666666667, ans=0.0 2023-11-18 22:33:25,047 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 5950, loss[loss=0.08169, simple_loss=0.09903, pruned_loss=0.02333, audio_tagging_loss=0.008851, over 15726.00 frames. ], tot_loss[loss=0.09708, simple_loss=0.1133, pruned_loss=0.02938, audio_tagging_loss=0.01103, over 3046687.80 frames. ], batch size: 61, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:33:30,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.67 vs. limit=22.5 2023-11-18 22:33:33,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=440446.6666666667, ans=0.0 2023-11-18 22:33:45,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=440513.3333333333, ans=0.125 2023-11-18 22:34:21,294 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6000, loss[loss=0.09886, simple_loss=0.1105, pruned_loss=0.02945, audio_tagging_loss=0.01415, over 15913.00 frames. ], tot_loss[loss=0.09718, simple_loss=0.1131, pruned_loss=0.02957, audio_tagging_loss=0.01108, over 3039866.12 frames. ], batch size: 61, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:34:21,295 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 22:34:54,496 INFO [train_asr.py:1147] (3/4) Epoch 6, validation: loss=0.07034, simple_loss=0.0589, pruned_loss=0.008199, audio_tagging_loss=0.03269, over 4681554.00 frames. 2023-11-18 22:34:54,497 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 22:35:26,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.42 vs. limit=10.0 2023-11-18 22:35:33,137 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:35:34,148 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.702e+01 9.372e+01 1.021e+02 1.628e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-18 22:35:39,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=441046.6666666667, ans=0.125 2023-11-18 22:35:41,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.01 vs. limit=10.0 2023-11-18 22:35:50,038 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6050, loss[loss=0.1205, simple_loss=0.1387, pruned_loss=0.03964, audio_tagging_loss=0.01157, over 15715.00 frames. ], tot_loss[loss=0.09757, simple_loss=0.1137, pruned_loss=0.02969, audio_tagging_loss=0.01103, over 3043867.46 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:36:04,510 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-18 22:36:06,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=441180.0, ans=0.04949747468305833 2023-11-18 22:36:15,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=441246.6666666667, ans=0.2 2023-11-18 22:36:29,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=441313.3333333333, ans=0.125 2023-11-18 22:36:46,621 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6100, loss[loss=0.08022, simple_loss=0.08985, pruned_loss=0.02518, audio_tagging_loss=0.01011, over 15896.00 frames. ], tot_loss[loss=0.09778, simple_loss=0.1135, pruned_loss=0.02989, audio_tagging_loss=0.01113, over 3041996.76 frames. ], batch size: 61, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:36:50,605 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=15.0 2023-11-18 22:37:03,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=441513.3333333333, ans=0.125 2023-11-18 22:37:26,318 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.924e+01 9.623e+01 1.090e+02 1.421e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-18 22:37:39,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=441713.3333333333, ans=0.2 2023-11-18 22:37:41,749 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6150, loss[loss=0.1008, simple_loss=0.1172, pruned_loss=0.02999, audio_tagging_loss=0.01215, over 14962.00 frames. ], tot_loss[loss=0.09726, simple_loss=0.1129, pruned_loss=0.02952, audio_tagging_loss=0.01129, over 3041798.79 frames. ], batch size: 54, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:37:41,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=441780.0, ans=0.125 2023-11-18 22:37:57,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.73 vs. limit=15.0 2023-11-18 22:38:04,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=441913.3333333333, ans=0.2 2023-11-18 22:38:07,059 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2023-11-18 22:38:21,071 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:38:31,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=442046.6666666667, ans=0.025 2023-11-18 22:38:37,185 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6200, loss[loss=0.08484, simple_loss=0.09548, pruned_loss=0.02588, audio_tagging_loss=0.01123, over 14267.00 frames. ], tot_loss[loss=0.09677, simple_loss=0.1123, pruned_loss=0.02925, audio_tagging_loss=0.01139, over 3044423.58 frames. ], batch size: 54, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:38:50,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=442180.0, ans=0.0 2023-11-18 22:38:52,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2023-11-18 22:39:00,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=442246.6666666667, ans=0.2 2023-11-18 22:39:17,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.898e+01 9.637e+01 1.062e+02 1.709e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-18 22:39:23,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=442380.0, ans=0.125 2023-11-18 22:39:33,417 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6250, loss[loss=0.0972, simple_loss=0.1195, pruned_loss=0.02724, audio_tagging_loss=0.01021, over 17351.00 frames. ], tot_loss[loss=0.09698, simple_loss=0.1126, pruned_loss=0.02927, audio_tagging_loss=0.01141, over 3048380.05 frames. ], batch size: 65, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:39:36,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=442446.6666666667, ans=0.0 2023-11-18 22:40:08,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=442646.6666666667, ans=0.125 2023-11-18 22:40:16,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=442646.6666666667, ans=0.125 2023-11-18 22:40:16,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=15.0 2023-11-18 22:40:25,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=442713.3333333333, ans=0.2 2023-11-18 22:40:29,546 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6300, loss[loss=0.1037, simple_loss=0.1214, pruned_loss=0.03198, audio_tagging_loss=0.01105, over 15501.00 frames. ], tot_loss[loss=0.09779, simple_loss=0.1137, pruned_loss=0.02947, audio_tagging_loss=0.01148, over 3051753.31 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:40:32,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=442780.0, ans=0.0 2023-11-18 22:40:38,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.23 vs. limit=10.0 2023-11-18 22:40:42,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=442846.6666666667, ans=10.0 2023-11-18 22:40:44,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=442846.6666666667, ans=0.125 2023-11-18 22:40:44,937 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=442846.6666666667, ans=0.0 2023-11-18 22:40:49,821 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=442846.6666666667, ans=0.125 2023-11-18 22:40:51,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.96 vs. limit=12.0 2023-11-18 22:41:03,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2023-11-18 22:41:09,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.985e+01 9.817e+01 1.039e+02 1.348e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-18 22:41:18,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=443046.6666666667, ans=0.2 2023-11-18 22:41:23,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443113.3333333333, ans=0.1 2023-11-18 22:41:24,955 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6350, loss[loss=0.0899, simple_loss=0.1051, pruned_loss=0.02753, audio_tagging_loss=0.009815, over 14640.00 frames. ], tot_loss[loss=0.09847, simple_loss=0.1147, pruned_loss=0.02977, audio_tagging_loss=0.01136, over 3047224.20 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:41:28,932 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=443113.3333333333, ans=0.0 2023-11-18 22:41:43,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=443180.0, ans=0.0 2023-11-18 22:41:51,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=443246.6666666667, ans=0.0 2023-11-18 22:41:55,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=443246.6666666667, ans=0.0 2023-11-18 22:42:21,095 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6400, loss[loss=0.05743, simple_loss=0.05718, pruned_loss=0.0144, audio_tagging_loss=0.01444, over 14191.00 frames. ], tot_loss[loss=0.09791, simple_loss=0.1138, pruned_loss=0.02954, audio_tagging_loss=0.01149, over 3041587.29 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:42:38,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=443513.3333333333, ans=0.2 2023-11-18 22:42:39,145 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.21 vs. limit=10.0 2023-11-18 22:42:41,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=443580.0, ans=0.0 2023-11-18 22:42:46,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=443580.0, ans=0.125 2023-11-18 22:42:47,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=443580.0, ans=0.125 2023-11-18 22:43:01,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.672e+01 9.519e+01 1.064e+02 1.432e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-18 22:43:06,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=443713.3333333333, ans=0.125 2023-11-18 22:43:08,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=443713.3333333333, ans=0.0 2023-11-18 22:43:16,887 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6450, loss[loss=0.07966, simple_loss=0.08537, pruned_loss=0.02547, audio_tagging_loss=0.0115, over 13751.00 frames. ], tot_loss[loss=0.09739, simple_loss=0.1131, pruned_loss=0.02936, audio_tagging_loss=0.01149, over 3036172.06 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:43:21,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=443780.0, ans=0.125 2023-11-18 22:44:12,486 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6500, loss[loss=0.1161, simple_loss=0.1462, pruned_loss=0.03602, audio_tagging_loss=0.006956, over 15478.00 frames. ], tot_loss[loss=0.09684, simple_loss=0.1127, pruned_loss=0.02904, audio_tagging_loss=0.01146, over 3040035.20 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:44:12,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=444113.3333333333, ans=0.0 2023-11-18 22:44:37,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=444246.6666666667, ans=0.125 2023-11-18 22:44:40,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-11-18 22:44:48,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=444313.3333333333, ans=0.125 2023-11-18 22:44:52,547 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.587e+01 9.452e+01 1.044e+02 1.613e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-18 22:44:53,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=444313.3333333333, ans=0.0 2023-11-18 22:44:54,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-11-18 22:45:03,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=444380.0, ans=0.0 2023-11-18 22:45:09,097 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6550, loss[loss=0.1043, simple_loss=0.1332, pruned_loss=0.02832, audio_tagging_loss=0.009433, over 15841.00 frames. ], tot_loss[loss=0.09689, simple_loss=0.1131, pruned_loss=0.02913, audio_tagging_loss=0.01121, over 3045641.28 frames. ], batch size: 59, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:45:23,649 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:45:28,812 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:45:41,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444646.6666666667, ans=0.1 2023-11-18 22:46:04,535 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6600, loss[loss=0.09647, simple_loss=0.1197, pruned_loss=0.02812, audio_tagging_loss=0.008519, over 15173.00 frames. ], tot_loss[loss=0.09756, simple_loss=0.1144, pruned_loss=0.02936, audio_tagging_loss=0.01103, over 3045391.24 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:46:07,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=444780.0, ans=0.1 2023-11-18 22:46:16,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444846.6666666667, ans=0.1 2023-11-18 22:46:21,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=444846.6666666667, ans=0.125 2023-11-18 22:46:30,660 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.628e-03 2023-11-18 22:46:44,722 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 9.179e+01 9.898e+01 1.109e+02 1.412e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-18 22:46:48,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=445046.6666666667, ans=0.0 2023-11-18 22:46:59,490 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6650, loss[loss=0.05997, simple_loss=0.06619, pruned_loss=0.01147, audio_tagging_loss=0.01541, over 15906.00 frames. ], tot_loss[loss=0.09654, simple_loss=0.1129, pruned_loss=0.02906, audio_tagging_loss=0.01105, over 3039504.56 frames. ], batch size: 61, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:47:25,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=445246.6666666667, ans=0.125 2023-11-18 22:47:28,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=445246.6666666667, ans=0.0 2023-11-18 22:47:29,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.07 vs. limit=22.5 2023-11-18 22:47:33,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=445313.3333333333, ans=0.125 2023-11-18 22:47:40,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=445313.3333333333, ans=0.125 2023-11-18 22:47:48,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2023-11-18 22:47:49,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=445380.0, ans=0.1 2023-11-18 22:47:54,892 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6700, loss[loss=0.1059, simple_loss=0.1304, pruned_loss=0.03161, audio_tagging_loss=0.009078, over 14600.00 frames. ], tot_loss[loss=0.0958, simple_loss=0.1119, pruned_loss=0.02883, audio_tagging_loss=0.01101, over 3039400.54 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:47:59,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=22.5 2023-11-18 22:48:27,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=445646.6666666667, ans=0.125 2023-11-18 22:48:36,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.188e+01 9.958e+01 1.118e+02 1.458e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-18 22:48:47,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=445713.3333333333, ans=0.125 2023-11-18 22:48:51,579 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6750, loss[loss=0.1115, simple_loss=0.1303, pruned_loss=0.0386, audio_tagging_loss=0.007779, over 14987.00 frames. ], tot_loss[loss=0.0959, simple_loss=0.1118, pruned_loss=0.02891, audio_tagging_loss=0.01108, over 3043056.64 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 16.0 2023-11-18 22:48:51,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=445780.0, ans=0.0 2023-11-18 22:49:00,210 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445780.0, ans=0.1 2023-11-18 22:49:03,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=445846.6666666667, ans=0.2 2023-11-18 22:49:07,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=445846.6666666667, ans=0.2 2023-11-18 22:49:11,223 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-11-18 22:49:13,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.59 vs. limit=10.0 2023-11-18 22:49:24,708 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:49:36,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=446046.6666666667, ans=0.0 2023-11-18 22:49:46,736 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6800, loss[loss=0.08556, simple_loss=0.09263, pruned_loss=0.02848, audio_tagging_loss=0.01076, over 14675.00 frames. ], tot_loss[loss=0.09606, simple_loss=0.1123, pruned_loss=0.02892, audio_tagging_loss=0.01097, over 3034553.10 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:49:48,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446113.3333333333, ans=0.1 2023-11-18 22:50:05,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=446180.0, ans=0.125 2023-11-18 22:50:27,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.920e+01 9.995e+01 1.137e+02 1.788e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 22:50:29,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=446313.3333333333, ans=0.0 2023-11-18 22:50:33,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=446380.0, ans=0.04949747468305833 2023-11-18 22:50:36,083 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2023-11-18 22:50:41,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=446446.6666666667, ans=0.09899494936611666 2023-11-18 22:50:42,356 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6850, loss[loss=0.1296, simple_loss=0.1599, pruned_loss=0.04334, audio_tagging_loss=0.006292, over 15842.00 frames. ], tot_loss[loss=0.09567, simple_loss=0.1117, pruned_loss=0.02878, audio_tagging_loss=0.01103, over 3023407.71 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:51:15,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=446646.6666666667, ans=0.125 2023-11-18 22:51:33,365 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2023-11-18 22:51:39,209 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6900, loss[loss=0.07821, simple_loss=0.0929, pruned_loss=0.02237, audio_tagging_loss=0.009387, over 14523.00 frames. ], tot_loss[loss=0.09618, simple_loss=0.1126, pruned_loss=0.02888, audio_tagging_loss=0.01098, over 3037659.28 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:51:49,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-18 22:51:49,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=446846.6666666667, ans=0.125 2023-11-18 22:51:52,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=446846.6666666667, ans=0.0 2023-11-18 22:52:04,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-18 22:52:19,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.182e+01 9.955e+01 1.058e+02 1.430e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-18 22:52:20,933 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:52:34,141 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 6950, loss[loss=0.0876, simple_loss=0.1032, pruned_loss=0.02492, audio_tagging_loss=0.01106, over 13993.00 frames. ], tot_loss[loss=0.09611, simple_loss=0.1121, pruned_loss=0.02901, audio_tagging_loss=0.01103, over 3036375.25 frames. ], batch size: 52, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:53:09,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.42 vs. limit=6.0 2023-11-18 22:53:29,845 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7000, loss[loss=0.1103, simple_loss=0.1357, pruned_loss=0.03368, audio_tagging_loss=0.00877, over 15921.00 frames. ], tot_loss[loss=0.09598, simple_loss=0.1121, pruned_loss=0.02891, audio_tagging_loss=0.01105, over 3043161.47 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:53:35,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2023-11-18 22:53:57,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=447580.0, ans=0.025 2023-11-18 22:53:57,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447580.0, ans=0.1 2023-11-18 22:54:02,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447646.6666666667, ans=0.1 2023-11-18 22:54:10,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.734e+01 9.498e+01 1.045e+02 1.881e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-18 22:54:14,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=447713.3333333333, ans=0.2 2023-11-18 22:54:25,863 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7050, loss[loss=0.1067, simple_loss=0.1373, pruned_loss=0.02985, audio_tagging_loss=0.008216, over 16528.00 frames. ], tot_loss[loss=0.09579, simple_loss=0.1118, pruned_loss=0.02874, audio_tagging_loss=0.01112, over 3046430.47 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:54:56,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=447913.3333333333, ans=0.1 2023-11-18 22:55:21,692 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7100, loss[loss=0.1203, simple_loss=0.1462, pruned_loss=0.0375, audio_tagging_loss=0.009718, over 16137.00 frames. ], tot_loss[loss=0.09529, simple_loss=0.1112, pruned_loss=0.02841, audio_tagging_loss=0.01127, over 3051578.15 frames. ], batch size: 60, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:55:57,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=448313.3333333333, ans=0.2 2023-11-18 22:56:02,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448313.3333333333, ans=0.1 2023-11-18 22:56:03,001 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.011e+01 9.786e+01 1.101e+02 1.464e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-18 22:56:16,826 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7150, loss[loss=0.1104, simple_loss=0.1349, pruned_loss=0.03214, audio_tagging_loss=0.01083, over 15134.00 frames. ], tot_loss[loss=0.09603, simple_loss=0.1119, pruned_loss=0.0287, audio_tagging_loss=0.01141, over 3053221.97 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:56:17,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=448446.6666666667, ans=0.125 2023-11-18 22:56:31,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=448513.3333333333, ans=0.0 2023-11-18 22:56:33,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=448513.3333333333, ans=0.0 2023-11-18 22:56:49,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=448580.0, ans=0.0 2023-11-18 22:57:02,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=448713.3333333333, ans=0.125 2023-11-18 22:57:13,664 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7200, loss[loss=0.09829, simple_loss=0.1171, pruned_loss=0.02745, audio_tagging_loss=0.01228, over 14640.00 frames. ], tot_loss[loss=0.09592, simple_loss=0.1116, pruned_loss=0.02867, audio_tagging_loss=0.01147, over 3052340.68 frames. ], batch size: 53, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:57:43,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448913.3333333333, ans=0.1 2023-11-18 22:57:52,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=448980.0, ans=0.125 2023-11-18 22:57:52,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=448980.0, ans=0.0 2023-11-18 22:57:54,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.744e+01 9.483e+01 1.054e+02 1.266e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-18 22:58:07,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=449113.3333333333, ans=0.0 2023-11-18 22:58:08,831 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7250, loss[loss=0.1124, simple_loss=0.1426, pruned_loss=0.03241, audio_tagging_loss=0.008703, over 15723.00 frames. ], tot_loss[loss=0.0954, simple_loss=0.111, pruned_loss=0.02834, audio_tagging_loss=0.01158, over 3053015.94 frames. ], batch size: 59, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:58:10,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=449113.3333333333, ans=0.2 2023-11-18 22:58:22,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=449180.0, ans=0.125 2023-11-18 22:58:32,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449246.6666666667, ans=0.1 2023-11-18 22:58:37,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2023-11-18 22:59:04,606 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7300, loss[loss=0.09452, simple_loss=0.09797, pruned_loss=0.03366, audio_tagging_loss=0.01187, over 14466.00 frames. ], tot_loss[loss=0.09522, simple_loss=0.1111, pruned_loss=0.02831, audio_tagging_loss=0.01138, over 3047472.36 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:59:25,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=449513.3333333333, ans=0.2 2023-11-18 22:59:27,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=449580.0, ans=0.0 2023-11-18 22:59:45,852 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.690e+01 9.830e+01 1.104e+02 1.354e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-18 22:59:47,189 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=449646.6666666667, ans=0.125 2023-11-18 22:59:48,096 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:59:54,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=449713.3333333333, ans=0.125 2023-11-18 23:00:00,639 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7350, loss[loss=0.1089, simple_loss=0.1254, pruned_loss=0.03396, audio_tagging_loss=0.01229, over 15980.00 frames. ], tot_loss[loss=0.09497, simple_loss=0.1109, pruned_loss=0.02829, audio_tagging_loss=0.01123, over 3045671.44 frames. ], batch size: 61, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:00:05,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=449780.0, ans=0.0 2023-11-18 23:00:10,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=449846.6666666667, ans=0.125 2023-11-18 23:00:16,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-18 23:00:19,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=449846.6666666667, ans=0.125 2023-11-18 23:00:21,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=449913.3333333333, ans=0.125 2023-11-18 23:00:22,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=449913.3333333333, ans=0.09899494936611666 2023-11-18 23:00:25,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449913.3333333333, ans=0.1 2023-11-18 23:00:55,968 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7400, loss[loss=0.09045, simple_loss=0.1142, pruned_loss=0.0249, audio_tagging_loss=0.008447, over 15218.00 frames. ], tot_loss[loss=0.09448, simple_loss=0.1107, pruned_loss=0.02805, audio_tagging_loss=0.01106, over 3050540.10 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:00:57,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.39 vs. limit=10.0 2023-11-18 23:00:59,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=450113.3333333333, ans=0.125 2023-11-18 23:01:02,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=450113.3333333333, ans=0.0 2023-11-18 23:01:15,364 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450180.0, ans=0.1 2023-11-18 23:01:34,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=450313.3333333333, ans=0.125 2023-11-18 23:01:34,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=450313.3333333333, ans=0.0 2023-11-18 23:01:37,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.086e+01 1.005e+02 1.143e+02 1.555e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 23:01:40,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.93 vs. limit=22.5 2023-11-18 23:01:43,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=450380.0, ans=0.125 2023-11-18 23:01:44,114 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:01:48,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2023-11-18 23:01:51,873 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7450, loss[loss=0.07513, simple_loss=0.0952, pruned_loss=0.01684, audio_tagging_loss=0.0107, over 16663.00 frames. ], tot_loss[loss=0.09423, simple_loss=0.11, pruned_loss=0.02802, audio_tagging_loss=0.01119, over 3056040.28 frames. ], batch size: 64, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:02:05,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450513.3333333333, ans=0.1 2023-11-18 23:02:11,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450513.3333333333, ans=0.1 2023-11-18 23:02:30,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=450646.6666666667, ans=0.0 2023-11-18 23:02:32,933 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:02:34,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=450646.6666666667, ans=0.0 2023-11-18 23:02:46,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450780.0, ans=0.1 2023-11-18 23:02:47,024 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2023-11-18 23:02:47,573 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7500, loss[loss=0.1058, simple_loss=0.1326, pruned_loss=0.03117, audio_tagging_loss=0.008326, over 16659.00 frames. ], tot_loss[loss=0.09546, simple_loss=0.1116, pruned_loss=0.02863, audio_tagging_loss=0.01104, over 3054605.05 frames. ], batch size: 62, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:03:07,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-11-18 23:03:21,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=450980.0, ans=0.125 2023-11-18 23:03:21,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450980.0, ans=0.1 2023-11-18 23:03:30,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.742e+01 9.569e+01 1.067e+02 1.631e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-18 23:03:40,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=451046.6666666667, ans=0.125 2023-11-18 23:03:41,761 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:03:43,629 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7550, loss[loss=0.0955, simple_loss=0.1107, pruned_loss=0.02973, audio_tagging_loss=0.01043, over 13226.00 frames. ], tot_loss[loss=0.0959, simple_loss=0.1122, pruned_loss=0.02881, audio_tagging_loss=0.01101, over 3054992.11 frames. ], batch size: 52, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:03:44,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=451113.3333333333, ans=0.0 2023-11-18 23:03:59,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=451180.0, ans=0.125 2023-11-18 23:04:07,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=451246.6666666667, ans=0.1 2023-11-18 23:04:08,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=451246.6666666667, ans=0.0 2023-11-18 23:04:16,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=451313.3333333333, ans=0.125 2023-11-18 23:04:23,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=451313.3333333333, ans=0.5 2023-11-18 23:04:37,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=451446.6666666667, ans=0.125 2023-11-18 23:04:38,065 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7600, loss[loss=0.09792, simple_loss=0.1159, pruned_loss=0.0309, audio_tagging_loss=0.009059, over 15276.00 frames. ], tot_loss[loss=0.09616, simple_loss=0.1123, pruned_loss=0.02901, audio_tagging_loss=0.01102, over 3057669.62 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:04:39,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451446.6666666667, ans=0.1 2023-11-18 23:04:45,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=451446.6666666667, ans=0.1 2023-11-18 23:04:48,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=451513.3333333333, ans=0.125 2023-11-18 23:05:08,476 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:05:20,678 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 8.942e+01 9.830e+01 1.127e+02 1.912e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-18 23:05:33,287 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7650, loss[loss=0.1268, simple_loss=0.1463, pruned_loss=0.04235, audio_tagging_loss=0.01127, over 15688.00 frames. ], tot_loss[loss=0.09535, simple_loss=0.1114, pruned_loss=0.02868, audio_tagging_loss=0.01098, over 3055367.44 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:05:34,101 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:05:55,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=451913.3333333333, ans=0.125 2023-11-18 23:05:56,673 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:06:03,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=451913.3333333333, ans=0.07 2023-11-18 23:06:03,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=451913.3333333333, ans=0.125 2023-11-18 23:06:04,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=451913.3333333333, ans=0.0 2023-11-18 23:06:07,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=451980.0, ans=0.0 2023-11-18 23:06:16,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2023-11-18 23:06:29,830 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7700, loss[loss=0.1032, simple_loss=0.1218, pruned_loss=0.03224, audio_tagging_loss=0.0101, over 15067.00 frames. ], tot_loss[loss=0.09552, simple_loss=0.1118, pruned_loss=0.02875, audio_tagging_loss=0.01089, over 3048975.06 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:06:37,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=22.5 2023-11-18 23:06:52,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=452246.6666666667, ans=0.0 2023-11-18 23:07:13,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.646e+01 9.561e+01 1.050e+02 1.437e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-18 23:07:24,964 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7750, loss[loss=0.1351, simple_loss=0.1587, pruned_loss=0.04736, audio_tagging_loss=0.008427, over 15426.00 frames. ], tot_loss[loss=0.09616, simple_loss=0.1123, pruned_loss=0.02907, audio_tagging_loss=0.01096, over 3053918.07 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:07:29,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=452446.6666666667, ans=0.04949747468305833 2023-11-18 23:07:35,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.50 vs. limit=10.0 2023-11-18 23:07:47,867 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.91 vs. limit=10.0 2023-11-18 23:07:56,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=452580.0, ans=0.025 2023-11-18 23:08:13,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=452713.3333333333, ans=0.0 2023-11-18 23:08:19,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=452780.0, ans=0.125 2023-11-18 23:08:20,992 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7800, loss[loss=0.1051, simple_loss=0.1202, pruned_loss=0.03076, audio_tagging_loss=0.01423, over 16125.00 frames. ], tot_loss[loss=0.09685, simple_loss=0.113, pruned_loss=0.02932, audio_tagging_loss=0.01103, over 3056522.61 frames. ], batch size: 62, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:08:28,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=452780.0, ans=0.0 2023-11-18 23:08:34,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=452846.6666666667, ans=15.0 2023-11-18 23:08:37,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=452846.6666666667, ans=0.0 2023-11-18 23:08:39,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2023-11-18 23:08:42,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.95 vs. limit=22.5 2023-11-18 23:08:48,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=452913.3333333333, ans=0.125 2023-11-18 23:08:57,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=452980.0, ans=0.125 2023-11-18 23:09:04,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.766e+01 9.659e+01 1.074e+02 1.731e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-18 23:09:08,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=453046.6666666667, ans=0.0 2023-11-18 23:09:14,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=453046.6666666667, ans=0.125 2023-11-18 23:09:15,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-18 23:09:17,034 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7850, loss[loss=0.1002, simple_loss=0.1116, pruned_loss=0.03341, audio_tagging_loss=0.01102, over 15713.00 frames. ], tot_loss[loss=0.09683, simple_loss=0.113, pruned_loss=0.02922, audio_tagging_loss=0.01111, over 3057105.51 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:09:28,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=453180.0, ans=0.0 2023-11-18 23:09:42,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=453246.6666666667, ans=0.125 2023-11-18 23:09:44,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=453246.6666666667, ans=0.125 2023-11-18 23:10:14,374 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7900, loss[loss=0.1058, simple_loss=0.1249, pruned_loss=0.0325, audio_tagging_loss=0.01088, over 16981.00 frames. ], tot_loss[loss=0.09744, simple_loss=0.1137, pruned_loss=0.02941, audio_tagging_loss=0.01119, over 3055523.38 frames. ], batch size: 61, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:10:16,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453446.6666666667, ans=0.1 2023-11-18 23:10:57,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.743e+01 9.470e+01 1.071e+02 1.653e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-18 23:11:00,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=453713.3333333333, ans=0.035 2023-11-18 23:11:02,427 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-18 23:11:04,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=453713.3333333333, ans=0.125 2023-11-18 23:11:09,814 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 7950, loss[loss=0.09132, simple_loss=0.09358, pruned_loss=0.03195, audio_tagging_loss=0.01257, over 14669.00 frames. ], tot_loss[loss=0.09683, simple_loss=0.1129, pruned_loss=0.02912, audio_tagging_loss=0.01126, over 3050688.52 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:11:22,623 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:11:24,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=453846.6666666667, ans=0.2 2023-11-18 23:11:44,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=453980.0, ans=0.0 2023-11-18 23:11:53,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=454046.6666666667, ans=0.07 2023-11-18 23:12:03,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=454046.6666666667, ans=0.0 2023-11-18 23:12:04,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=454113.3333333333, ans=0.125 2023-11-18 23:12:05,805 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8000, loss[loss=0.09782, simple_loss=0.1052, pruned_loss=0.03299, audio_tagging_loss=0.01225, over 14109.00 frames. ], tot_loss[loss=0.09599, simple_loss=0.1118, pruned_loss=0.02868, audio_tagging_loss=0.01143, over 3047699.27 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:12:06,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=454113.3333333333, ans=0.125 2023-11-18 23:12:07,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=12.0 2023-11-18 23:12:15,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=454180.0, ans=0.0 2023-11-18 23:12:28,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=454246.6666666667, ans=0.05 2023-11-18 23:12:28,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2023-11-18 23:12:44,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=454313.3333333333, ans=0.0 2023-11-18 23:12:46,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=454313.3333333333, ans=0.0 2023-11-18 23:12:48,897 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.670e+01 9.475e+01 1.059e+02 1.539e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-18 23:13:00,541 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8050, loss[loss=0.07449, simple_loss=0.08194, pruned_loss=0.0207, audio_tagging_loss=0.01282, over 14952.00 frames. ], tot_loss[loss=0.09665, simple_loss=0.1123, pruned_loss=0.0291, audio_tagging_loss=0.0114, over 3046555.46 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:13:03,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=454446.6666666667, ans=0.125 2023-11-18 23:13:16,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454513.3333333333, ans=0.1 2023-11-18 23:13:22,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=454580.0, ans=0.0 2023-11-18 23:13:24,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=454580.0, ans=0.0 2023-11-18 23:13:28,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=454580.0, ans=0.125 2023-11-18 23:13:35,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=12.0 2023-11-18 23:13:40,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-18 23:13:46,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=454713.3333333333, ans=0.0 2023-11-18 23:13:56,072 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8100, loss[loss=0.09544, simple_loss=0.1001, pruned_loss=0.03263, audio_tagging_loss=0.01278, over 15177.00 frames. ], tot_loss[loss=0.09698, simple_loss=0.1129, pruned_loss=0.02918, audio_tagging_loss=0.01133, over 3045420.05 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:13:57,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454780.0, ans=0.1 2023-11-18 23:14:23,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=454913.3333333333, ans=0.125 2023-11-18 23:14:25,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=454913.3333333333, ans=0.125 2023-11-18 23:14:37,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454980.0, ans=0.1 2023-11-18 23:14:39,584 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 9.117e+01 9.858e+01 1.091e+02 1.353e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 23:14:52,353 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8150, loss[loss=0.1056, simple_loss=0.1259, pruned_loss=0.03507, audio_tagging_loss=0.007609, over 15876.00 frames. ], tot_loss[loss=0.0977, simple_loss=0.1141, pruned_loss=0.02956, audio_tagging_loss=0.01108, over 3043694.83 frames. ], batch size: 59, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:14:52,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=455113.3333333333, ans=0.0 2023-11-18 23:14:53,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=455113.3333333333, ans=0.0 2023-11-18 23:15:00,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=455113.3333333333, ans=0.0 2023-11-18 23:15:05,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.56 vs. limit=22.5 2023-11-18 23:15:13,445 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=455246.6666666667, ans=0.0 2023-11-18 23:15:14,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=455246.6666666667, ans=0.0 2023-11-18 23:15:15,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=455246.6666666667, ans=0.0 2023-11-18 23:15:23,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=455246.6666666667, ans=0.125 2023-11-18 23:15:47,071 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:15:48,104 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8200, loss[loss=0.08442, simple_loss=0.103, pruned_loss=0.02276, audio_tagging_loss=0.01018, over 15715.00 frames. ], tot_loss[loss=0.09863, simple_loss=0.1154, pruned_loss=0.03004, audio_tagging_loss=0.01089, over 3049021.40 frames. ], batch size: 60, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:15:49,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2023-11-18 23:16:01,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455513.3333333333, ans=0.1 2023-11-18 23:16:16,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=455580.0, ans=0.0 2023-11-18 23:16:30,919 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.905e+01 9.848e+01 1.096e+02 1.904e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 23:16:34,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455713.3333333333, ans=0.1 2023-11-18 23:16:37,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=455713.3333333333, ans=0.0 2023-11-18 23:16:40,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=455713.3333333333, ans=0.2 2023-11-18 23:16:41,036 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.18 vs. limit=10.0 2023-11-18 23:16:42,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455780.0, ans=0.1 2023-11-18 23:16:42,990 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8250, loss[loss=0.1112, simple_loss=0.1289, pruned_loss=0.0335, audio_tagging_loss=0.01322, over 14609.00 frames. ], tot_loss[loss=0.0974, simple_loss=0.114, pruned_loss=0.02947, audio_tagging_loss=0.01095, over 3053138.94 frames. ], batch size: 54, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:16:46,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=455780.0, ans=0.125 2023-11-18 23:17:08,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=455913.3333333333, ans=0.0 2023-11-18 23:17:18,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=15.0 2023-11-18 23:17:25,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=455980.0, ans=0.015 2023-11-18 23:17:35,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=456046.6666666667, ans=0.05 2023-11-18 23:17:38,209 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8300, loss[loss=0.06643, simple_loss=0.07384, pruned_loss=0.01491, audio_tagging_loss=0.0146, over 14875.00 frames. ], tot_loss[loss=0.09678, simple_loss=0.1132, pruned_loss=0.02922, audio_tagging_loss=0.01097, over 3051789.48 frames. ], batch size: 59, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:18:07,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=456246.6666666667, ans=0.125 2023-11-18 23:18:10,236 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2023-11-18 23:18:10,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=456313.3333333333, ans=0.125 2023-11-18 23:18:13,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=456313.3333333333, ans=0.0 2023-11-18 23:18:21,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.810e+01 9.840e+01 1.082e+02 1.589e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-18 23:18:33,342 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8350, loss[loss=0.1071, simple_loss=0.1172, pruned_loss=0.03541, audio_tagging_loss=0.0131, over 15366.00 frames. ], tot_loss[loss=0.09676, simple_loss=0.1134, pruned_loss=0.0292, audio_tagging_loss=0.01089, over 3056007.88 frames. ], batch size: 60, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:19:04,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=456580.0, ans=0.0 2023-11-18 23:19:17,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=456713.3333333333, ans=0.1 2023-11-18 23:19:28,803 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8400, loss[loss=0.1013, simple_loss=0.1042, pruned_loss=0.0358, audio_tagging_loss=0.01343, over 14677.00 frames. ], tot_loss[loss=0.09613, simple_loss=0.1126, pruned_loss=0.02886, audio_tagging_loss=0.01099, over 3056210.89 frames. ], batch size: 59, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:19:37,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.06 vs. limit=10.0 2023-11-18 23:19:42,732 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456846.6666666667, ans=0.1 2023-11-18 23:19:54,046 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=8.0 2023-11-18 23:20:12,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.924e+01 9.865e+01 1.104e+02 3.626e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-18 23:20:16,696 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.18 vs. limit=22.5 2023-11-18 23:20:24,657 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8450, loss[loss=0.08071, simple_loss=0.09021, pruned_loss=0.02211, audio_tagging_loss=0.0135, over 14669.00 frames. ], tot_loss[loss=0.09642, simple_loss=0.1129, pruned_loss=0.02907, audio_tagging_loss=0.01093, over 3052935.29 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:20:24,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=457113.3333333333, ans=0.125 2023-11-18 23:20:35,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=457180.0, ans=0.07 2023-11-18 23:20:49,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=457246.6666666667, ans=0.125 2023-11-18 23:20:50,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=15.0 2023-11-18 23:21:19,984 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8500, loss[loss=0.1212, simple_loss=0.1454, pruned_loss=0.03818, audio_tagging_loss=0.01031, over 15182.00 frames. ], tot_loss[loss=0.09764, simple_loss=0.1145, pruned_loss=0.02953, audio_tagging_loss=0.01087, over 3055185.96 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:21:29,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=457446.6666666667, ans=0.125 2023-11-18 23:21:34,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=457513.3333333333, ans=0.1 2023-11-18 23:21:37,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=457513.3333333333, ans=0.0 2023-11-18 23:21:40,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=457513.3333333333, ans=0.0 2023-11-18 23:21:58,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457646.6666666667, ans=0.1 2023-11-18 23:22:04,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.613e+01 9.508e+01 1.037e+02 1.516e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-18 23:22:15,589 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8550, loss[loss=0.1047, simple_loss=0.1213, pruned_loss=0.03329, audio_tagging_loss=0.01077, over 16074.00 frames. ], tot_loss[loss=0.09783, simple_loss=0.1145, pruned_loss=0.02969, audio_tagging_loss=0.01088, over 3050314.47 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:22:22,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=457780.0, ans=0.125 2023-11-18 23:22:30,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=457846.6666666667, ans=8.0 2023-11-18 23:22:37,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=15.0 2023-11-18 23:22:54,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=457980.0, ans=0.125 2023-11-18 23:22:56,914 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457980.0, ans=0.1 2023-11-18 23:23:07,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=458046.6666666667, ans=0.0 2023-11-18 23:23:08,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.58 vs. limit=15.0 2023-11-18 23:23:11,596 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8600, loss[loss=0.09717, simple_loss=0.08875, pruned_loss=0.03379, audio_tagging_loss=0.019, over 15210.00 frames. ], tot_loss[loss=0.09738, simple_loss=0.1138, pruned_loss=0.02947, audio_tagging_loss=0.011, over 3054845.13 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:23:32,746 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2023-11-18 23:23:32,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.56 vs. limit=15.0 2023-11-18 23:23:40,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=458246.6666666667, ans=0.07 2023-11-18 23:23:40,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=458246.6666666667, ans=0.2 2023-11-18 23:23:56,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.729e+01 9.610e+01 1.061e+02 1.458e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-18 23:24:06,890 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8650, loss[loss=0.06649, simple_loss=0.07656, pruned_loss=0.01736, audio_tagging_loss=0.01085, over 15753.00 frames. ], tot_loss[loss=0.09804, simple_loss=0.1146, pruned_loss=0.0297, audio_tagging_loss=0.01104, over 3050699.67 frames. ], batch size: 60, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:24:12,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=458446.6666666667, ans=0.5 2023-11-18 23:24:41,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=458646.6666666667, ans=10.0 2023-11-18 23:24:47,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=458646.6666666667, ans=0.09899494936611666 2023-11-18 23:24:58,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=458713.3333333333, ans=0.09899494936611666 2023-11-18 23:24:58,617 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2023-11-18 23:25:02,862 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8700, loss[loss=0.1121, simple_loss=0.1358, pruned_loss=0.0366, audio_tagging_loss=0.007611, over 15314.00 frames. ], tot_loss[loss=0.09775, simple_loss=0.1142, pruned_loss=0.0295, audio_tagging_loss=0.01115, over 3054832.59 frames. ], batch size: 54, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:25:19,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=458846.6666666667, ans=0.0 2023-11-18 23:25:47,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.228e+01 9.960e+01 1.094e+02 1.937e+02, threshold=1.992e+02, percent-clipped=1.0 2023-11-18 23:25:48,926 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.59 vs. limit=15.0 2023-11-18 23:25:50,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=459046.6666666667, ans=0.125 2023-11-18 23:25:58,429 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8750, loss[loss=0.09254, simple_loss=0.1067, pruned_loss=0.02553, audio_tagging_loss=0.01369, over 14933.00 frames. ], tot_loss[loss=0.09785, simple_loss=0.1142, pruned_loss=0.02955, audio_tagging_loss=0.01118, over 3055162.01 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:26:27,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2023-11-18 23:26:33,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=459313.3333333333, ans=0.125 2023-11-18 23:26:41,852 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2023-11-18 23:26:54,472 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8800, loss[loss=0.09817, simple_loss=0.1163, pruned_loss=0.0291, audio_tagging_loss=0.01093, over 14766.00 frames. ], tot_loss[loss=0.09871, simple_loss=0.1154, pruned_loss=0.0298, audio_tagging_loss=0.0112, over 3052879.85 frames. ], batch size: 53, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:27:32,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-18 23:27:38,832 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.027e+01 8.792e+01 9.740e+01 1.071e+02 1.410e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-18 23:27:49,272 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8850, loss[loss=0.1072, simple_loss=0.1273, pruned_loss=0.03262, audio_tagging_loss=0.01091, over 15375.00 frames. ], tot_loss[loss=0.09864, simple_loss=0.1156, pruned_loss=0.0297, audio_tagging_loss=0.01116, over 3054082.71 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:27:58,827 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:28:04,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=459846.6666666667, ans=0.125 2023-11-18 23:28:09,967 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.56 vs. limit=15.0 2023-11-18 23:28:15,217 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=15.0 2023-11-18 23:28:20,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=459913.3333333333, ans=0.0 2023-11-18 23:28:22,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.64 vs. limit=22.5 2023-11-18 23:28:34,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=460046.6666666667, ans=0.0 2023-11-18 23:28:45,823 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8900, loss[loss=0.1097, simple_loss=0.1318, pruned_loss=0.03586, audio_tagging_loss=0.007938, over 16096.00 frames. ], tot_loss[loss=0.09857, simple_loss=0.1154, pruned_loss=0.02991, audio_tagging_loss=0.01096, over 3060444.25 frames. ], batch size: 60, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:28:52,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=22.5 2023-11-18 23:28:53,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=460113.3333333333, ans=0.0 2023-11-18 23:29:05,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460180.0, ans=0.1 2023-11-18 23:29:24,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=460313.3333333333, ans=0.0 2023-11-18 23:29:30,090 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 9.040e+01 1.017e+02 1.156e+02 1.581e+02, threshold=2.035e+02, percent-clipped=0.0 2023-11-18 23:29:38,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=460380.0, ans=0.0 2023-11-18 23:29:41,803 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 8950, loss[loss=0.07835, simple_loss=0.09496, pruned_loss=0.02348, audio_tagging_loss=0.007391, over 14672.00 frames. ], tot_loss[loss=0.09837, simple_loss=0.1155, pruned_loss=0.02987, audio_tagging_loss=0.01077, over 3058056.96 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:30:05,948 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-11-18 23:30:17,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=460646.6666666667, ans=0.0 2023-11-18 23:30:20,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=460646.6666666667, ans=0.0 2023-11-18 23:30:28,694 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=12.0 2023-11-18 23:30:32,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=460713.3333333333, ans=0.125 2023-11-18 23:30:36,560 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9000, loss[loss=0.1096, simple_loss=0.1271, pruned_loss=0.03509, audio_tagging_loss=0.01097, over 15982.00 frames. ], tot_loss[loss=0.09797, simple_loss=0.1148, pruned_loss=0.02969, audio_tagging_loss=0.01085, over 3052316.29 frames. ], batch size: 60, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:30:36,561 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-18 23:30:55,878 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.0780, 3.1227, 4.9925, 4.4491], device='cuda:3') 2023-11-18 23:31:06,591 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.9826, 2.8921, 2.8004, 3.1665, 3.1797, 2.5535, 2.9611, 3.0448], device='cuda:3') 2023-11-18 23:31:09,100 INFO [train_asr.py:1147] (3/4) Epoch 6, validation: loss=0.07051, simple_loss=0.05865, pruned_loss=0.008039, audio_tagging_loss=0.03315, over 4681554.00 frames. 2023-11-18 23:31:09,101 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-18 23:31:09,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=460780.0, ans=0.125 2023-11-18 23:31:18,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=460780.0, ans=0.0 2023-11-18 23:31:37,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=460913.3333333333, ans=0.0 2023-11-18 23:31:39,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=460913.3333333333, ans=0.0 2023-11-18 23:31:53,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.759e+01 8.670e+01 9.700e+01 1.069e+02 1.408e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-18 23:32:03,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=461113.3333333333, ans=0.2 2023-11-18 23:32:04,555 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9050, loss[loss=0.09386, simple_loss=0.1034, pruned_loss=0.0281, audio_tagging_loss=0.01406, over 15126.00 frames. ], tot_loss[loss=0.09832, simple_loss=0.1149, pruned_loss=0.02999, audio_tagging_loss=0.01085, over 3048422.17 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:32:19,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=461180.0, ans=10.0 2023-11-18 23:32:32,841 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.91 vs. limit=6.0 2023-11-18 23:32:35,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=461246.6666666667, ans=0.0 2023-11-18 23:32:35,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=461246.6666666667, ans=0.2 2023-11-18 23:32:51,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=461380.0, ans=0.0 2023-11-18 23:32:55,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=12.0 2023-11-18 23:32:59,356 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9100, loss[loss=0.08036, simple_loss=0.09018, pruned_loss=0.02224, audio_tagging_loss=0.01303, over 15257.00 frames. ], tot_loss[loss=0.09754, simple_loss=0.1142, pruned_loss=0.02965, audio_tagging_loss=0.01082, over 3044002.84 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:33:07,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=461446.6666666667, ans=0.125 2023-11-18 23:33:16,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=461513.3333333333, ans=0.2 2023-11-18 23:33:35,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=461646.6666666667, ans=0.0 2023-11-18 23:33:43,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.366e+01 8.817e+01 9.477e+01 1.033e+02 1.344e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-18 23:33:52,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=461713.3333333333, ans=0.125 2023-11-18 23:33:54,715 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9150, loss[loss=0.0806, simple_loss=0.09297, pruned_loss=0.02312, audio_tagging_loss=0.011, over 15838.00 frames. ], tot_loss[loss=0.09599, simple_loss=0.112, pruned_loss=0.0291, audio_tagging_loss=0.01088, over 3038293.58 frames. ], batch size: 60, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:34:03,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=461780.0, ans=0.125 2023-11-18 23:34:05,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=461846.6666666667, ans=0.125 2023-11-18 23:34:06,449 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2023-11-18 23:34:12,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=461846.6666666667, ans=0.125 2023-11-18 23:34:19,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=461913.3333333333, ans=0.2 2023-11-18 23:34:24,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2023-11-18 23:34:34,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=461980.0, ans=10.0 2023-11-18 23:34:50,578 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9200, loss[loss=0.0811, simple_loss=0.09865, pruned_loss=0.02025, audio_tagging_loss=0.01152, over 14794.00 frames. ], tot_loss[loss=0.09652, simple_loss=0.1127, pruned_loss=0.02928, audio_tagging_loss=0.01087, over 3039717.44 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:35:03,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=462180.0, ans=0.0 2023-11-18 23:35:14,262 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=462246.6666666667, ans=0.125 2023-11-18 23:35:15,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=462246.6666666667, ans=0.125 2023-11-18 23:35:25,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462313.3333333333, ans=0.125 2023-11-18 23:35:33,914 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.725e+01 9.517e+01 1.069e+02 1.464e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-18 23:35:41,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=462380.0, ans=0.5 2023-11-18 23:35:44,339 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9250, loss[loss=0.08111, simple_loss=0.09614, pruned_loss=0.02196, audio_tagging_loss=0.01108, over 15517.00 frames. ], tot_loss[loss=0.09584, simple_loss=0.112, pruned_loss=0.029, audio_tagging_loss=0.01081, over 3044432.82 frames. ], batch size: 60, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:36:01,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=462513.3333333333, ans=0.0 2023-11-18 23:36:06,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=462580.0, ans=0.0 2023-11-18 23:36:12,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=12.0 2023-11-18 23:36:28,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=462713.3333333333, ans=0.125 2023-11-18 23:36:29,395 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2023-11-18 23:36:37,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=462713.3333333333, ans=0.0 2023-11-18 23:36:39,674 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9300, loss[loss=0.07562, simple_loss=0.09171, pruned_loss=0.01742, audio_tagging_loss=0.01234, over 15631.00 frames. ], tot_loss[loss=0.09575, simple_loss=0.1117, pruned_loss=0.02897, audio_tagging_loss=0.01093, over 3048813.28 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:36:53,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.49 vs. limit=15.0 2023-11-18 23:36:54,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=462846.6666666667, ans=0.125 2023-11-18 23:36:55,443 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-18 23:37:14,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=462980.0, ans=0.2 2023-11-18 23:37:23,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.607e+01 9.427e+01 1.041e+02 1.346e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-18 23:37:35,924 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9350, loss[loss=0.1231, simple_loss=0.1595, pruned_loss=0.03534, audio_tagging_loss=0.008007, over 15037.00 frames. ], tot_loss[loss=0.09624, simple_loss=0.1124, pruned_loss=0.02913, audio_tagging_loss=0.01092, over 3050379.43 frames. ], batch size: 54, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:37:57,474 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.25 vs. limit=22.5 2023-11-18 23:38:03,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=463246.6666666667, ans=0.0 2023-11-18 23:38:30,374 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9400, loss[loss=0.1148, simple_loss=0.1414, pruned_loss=0.03465, audio_tagging_loss=0.009495, over 15185.00 frames. ], tot_loss[loss=0.09698, simple_loss=0.1131, pruned_loss=0.02933, audio_tagging_loss=0.01109, over 3045519.25 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:38:36,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.26 vs. limit=15.0 2023-11-18 23:38:50,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=463513.3333333333, ans=0.2 2023-11-18 23:39:09,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=463646.6666666667, ans=0.125 2023-11-18 23:39:15,428 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.967e+01 9.961e+01 1.049e+02 1.500e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-18 23:39:18,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=463713.3333333333, ans=0.0 2023-11-18 23:39:19,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463713.3333333333, ans=0.1 2023-11-18 23:39:21,675 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:39:25,410 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9450, loss[loss=0.09647, simple_loss=0.1093, pruned_loss=0.02931, audio_tagging_loss=0.01254, over 14965.00 frames. ], tot_loss[loss=0.0974, simple_loss=0.1137, pruned_loss=0.02948, audio_tagging_loss=0.01107, over 3044640.94 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:39:33,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=463780.0, ans=0.125 2023-11-18 23:39:50,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=463913.3333333333, ans=0.125 2023-11-18 23:40:08,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2023-11-18 23:40:09,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-11-18 23:40:14,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=464046.6666666667, ans=0.125 2023-11-18 23:40:21,506 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9500, loss[loss=0.1216, simple_loss=0.1414, pruned_loss=0.03809, audio_tagging_loss=0.01281, over 15692.00 frames. ], tot_loss[loss=0.09818, simple_loss=0.1146, pruned_loss=0.02975, audio_tagging_loss=0.01113, over 3048311.91 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:40:27,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=464113.3333333333, ans=0.125 2023-11-18 23:40:36,336 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2023-11-18 23:40:43,914 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2023-11-18 23:41:07,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.036e+01 9.803e+01 1.077e+02 1.985e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-18 23:41:09,885 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2023-11-18 23:41:10,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=464380.0, ans=0.2 2023-11-18 23:41:17,268 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9550, loss[loss=0.1346, simple_loss=0.1479, pruned_loss=0.04948, audio_tagging_loss=0.01111, over 15252.00 frames. ], tot_loss[loss=0.09883, simple_loss=0.1151, pruned_loss=0.03, audio_tagging_loss=0.01127, over 3050269.24 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:41:20,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=464446.6666666667, ans=0.125 2023-11-18 23:41:20,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=464446.6666666667, ans=0.125 2023-11-18 23:41:23,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=464446.6666666667, ans=0.0 2023-11-18 23:41:36,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=464513.3333333333, ans=0.05 2023-11-18 23:41:56,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=464646.6666666667, ans=0.04949747468305833 2023-11-18 23:42:04,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=464713.3333333333, ans=0.0 2023-11-18 23:42:11,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=464780.0, ans=0.2 2023-11-18 23:42:12,591 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9600, loss[loss=0.1034, simple_loss=0.1248, pruned_loss=0.02868, audio_tagging_loss=0.0123, over 14903.00 frames. ], tot_loss[loss=0.09875, simple_loss=0.115, pruned_loss=0.02991, audio_tagging_loss=0.01133, over 3047950.28 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:42:12,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=464780.0, ans=0.125 2023-11-18 23:42:13,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=464780.0, ans=10.0 2023-11-18 23:42:37,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=464913.3333333333, ans=0.04949747468305833 2023-11-18 23:42:45,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=464980.0, ans=0.1 2023-11-18 23:42:57,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=465046.6666666667, ans=0.125 2023-11-18 23:42:58,488 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.742e+01 9.765e+01 1.051e+02 1.389e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-18 23:43:09,309 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9650, loss[loss=0.08634, simple_loss=0.1035, pruned_loss=0.02627, audio_tagging_loss=0.00834, over 16251.00 frames. ], tot_loss[loss=0.09825, simple_loss=0.1144, pruned_loss=0.02979, audio_tagging_loss=0.01128, over 3045204.78 frames. ], batch size: 62, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:43:09,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=465113.3333333333, ans=0.125 2023-11-18 23:43:13,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=465113.3333333333, ans=0.125 2023-11-18 23:43:15,900 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=465113.3333333333, ans=0.125 2023-11-18 23:43:25,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.44 vs. limit=22.5 2023-11-18 23:43:25,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2023-11-18 23:43:34,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465246.6666666667, ans=0.1 2023-11-18 23:43:44,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=465313.3333333333, ans=0.0 2023-11-18 23:43:51,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=465313.3333333333, ans=0.0 2023-11-18 23:44:04,658 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9700, loss[loss=0.08466, simple_loss=0.09781, pruned_loss=0.02717, audio_tagging_loss=0.00859, over 16166.00 frames. ], tot_loss[loss=0.09803, simple_loss=0.1147, pruned_loss=0.02958, audio_tagging_loss=0.01108, over 3043770.32 frames. ], batch size: 62, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:44:09,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=465446.6666666667, ans=0.1 2023-11-18 23:44:11,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=465446.6666666667, ans=0.02 2023-11-18 23:44:13,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.09 vs. limit=22.5 2023-11-18 23:44:30,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=465580.0, ans=0.2 2023-11-18 23:44:34,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=465580.0, ans=0.0 2023-11-18 23:44:35,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=465580.0, ans=0.2 2023-11-18 23:44:40,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=465646.6666666667, ans=0.2 2023-11-18 23:44:50,713 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.590e+01 9.606e+01 1.115e+02 1.456e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-18 23:45:00,258 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9750, loss[loss=0.1202, simple_loss=0.1438, pruned_loss=0.03947, audio_tagging_loss=0.008824, over 14978.00 frames. ], tot_loss[loss=0.09825, simple_loss=0.1155, pruned_loss=0.02964, audio_tagging_loss=0.01086, over 3045966.05 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:45:01,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=465780.0, ans=0.2 2023-11-18 23:45:21,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=465846.6666666667, ans=0.125 2023-11-18 23:45:25,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=12.0 2023-11-18 23:45:37,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=465980.0, ans=0.07 2023-11-18 23:45:50,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=466046.6666666667, ans=0.0 2023-11-18 23:45:51,322 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.69 vs. limit=12.0 2023-11-18 23:45:57,112 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9800, loss[loss=0.09748, simple_loss=0.1088, pruned_loss=0.03058, audio_tagging_loss=0.0125, over 13927.00 frames. ], tot_loss[loss=0.09747, simple_loss=0.114, pruned_loss=0.02954, audio_tagging_loss=0.01092, over 3039744.29 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:45:57,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=466113.3333333333, ans=0.125 2023-11-18 23:46:12,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=466180.0, ans=0.125 2023-11-18 23:46:21,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=466246.6666666667, ans=0.0 2023-11-18 23:46:36,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=466313.3333333333, ans=0.125 2023-11-18 23:46:38,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=12.0 2023-11-18 23:46:42,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.868e+01 8.589e+01 9.720e+01 1.056e+02 1.437e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-18 23:46:44,613 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:46:46,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=466380.0, ans=0.0 2023-11-18 23:46:47,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=466380.0, ans=0.2 2023-11-18 23:46:52,054 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9850, loss[loss=0.1105, simple_loss=0.1252, pruned_loss=0.03693, audio_tagging_loss=0.01098, over 15493.00 frames. ], tot_loss[loss=0.09774, simple_loss=0.1146, pruned_loss=0.0296, audio_tagging_loss=0.01085, over 3043526.48 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:46:54,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=466446.6666666667, ans=0.0 2023-11-18 23:46:54,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=466446.6666666667, ans=0.125 2023-11-18 23:46:59,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=466446.6666666667, ans=0.125 2023-11-18 23:47:21,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=466580.0, ans=0.125 2023-11-18 23:47:44,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=15.0 2023-11-18 23:47:47,560 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9900, loss[loss=0.08636, simple_loss=0.1029, pruned_loss=0.02525, audio_tagging_loss=0.009655, over 14674.00 frames. ], tot_loss[loss=0.09758, simple_loss=0.1145, pruned_loss=0.02954, audio_tagging_loss=0.01079, over 3049158.49 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:47:55,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-11-18 23:48:00,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2023-11-18 23:48:10,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=466913.3333333333, ans=0.0 2023-11-18 23:48:19,283 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=22.5 2023-11-18 23:48:25,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=466980.0, ans=0.125 2023-11-18 23:48:32,816 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.839e+01 9.469e+01 1.066e+02 1.468e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-18 23:48:43,400 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 9950, loss[loss=0.0871, simple_loss=0.1024, pruned_loss=0.02429, audio_tagging_loss=0.01164, over 14755.00 frames. ], tot_loss[loss=0.09755, simple_loss=0.1144, pruned_loss=0.02951, audio_tagging_loss=0.01086, over 3051612.93 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:48:55,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467180.0, ans=0.1 2023-11-18 23:48:55,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=467180.0, ans=0.2 2023-11-18 23:49:08,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=467246.6666666667, ans=0.125 2023-11-18 23:49:20,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.92 vs. limit=22.5 2023-11-18 23:49:25,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467313.3333333333, ans=0.1 2023-11-18 23:49:37,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=467446.6666666667, ans=0.125 2023-11-18 23:49:38,709 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10000, loss[loss=0.08375, simple_loss=0.1011, pruned_loss=0.02458, audio_tagging_loss=0.008599, over 15992.00 frames. ], tot_loss[loss=0.09658, simple_loss=0.1134, pruned_loss=0.02904, audio_tagging_loss=0.01085, over 3048778.69 frames. ], batch size: 59, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:49:58,000 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.52 vs. limit=22.5 2023-11-18 23:50:04,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=467580.0, ans=0.125 2023-11-18 23:50:17,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467646.6666666667, ans=0.1 2023-11-18 23:50:23,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 8.943e+01 9.834e+01 1.077e+02 1.357e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-18 23:50:25,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=467713.3333333333, ans=0.0 2023-11-18 23:50:33,363 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10050, loss[loss=0.134, simple_loss=0.172, pruned_loss=0.04029, audio_tagging_loss=0.007747, over 16907.00 frames. ], tot_loss[loss=0.09713, simple_loss=0.1139, pruned_loss=0.02927, audio_tagging_loss=0.01092, over 3048854.87 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:50:58,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2023-11-18 23:51:03,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=467913.3333333333, ans=0.125 2023-11-18 23:51:28,608 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10100, loss[loss=0.114, simple_loss=0.1247, pruned_loss=0.0372, audio_tagging_loss=0.01449, over 15133.00 frames. ], tot_loss[loss=0.09729, simple_loss=0.1141, pruned_loss=0.02936, audio_tagging_loss=0.01089, over 3050591.28 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:51:28,824 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:51:35,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=468113.3333333333, ans=0.2 2023-11-18 23:52:10,332 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:52:10,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=468313.3333333333, ans=0.125 2023-11-18 23:52:15,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 8.875e+01 9.595e+01 1.083e+02 1.455e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-18 23:52:16,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=468380.0, ans=0.125 2023-11-18 23:52:24,029 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10150, loss[loss=0.08965, simple_loss=0.1022, pruned_loss=0.02371, audio_tagging_loss=0.01486, over 15270.00 frames. ], tot_loss[loss=0.09726, simple_loss=0.114, pruned_loss=0.02924, audio_tagging_loss=0.01101, over 3048247.36 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:52:24,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=468446.6666666667, ans=0.0 2023-11-18 23:52:31,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=468446.6666666667, ans=0.0 2023-11-18 23:52:34,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468513.3333333333, ans=0.1 2023-11-18 23:52:39,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=468513.3333333333, ans=0.1 2023-11-18 23:52:42,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=468513.3333333333, ans=0.0 2023-11-18 23:52:45,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=468580.0, ans=0.035 2023-11-18 23:52:46,797 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:52:47,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=468580.0, ans=0.125 2023-11-18 23:52:54,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=22.5 2023-11-18 23:52:56,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=468646.6666666667, ans=0.2 2023-11-18 23:53:13,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=468713.3333333333, ans=0.05 2023-11-18 23:53:18,612 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10200, loss[loss=0.09891, simple_loss=0.118, pruned_loss=0.03125, audio_tagging_loss=0.008674, over 15729.00 frames. ], tot_loss[loss=0.09645, simple_loss=0.1131, pruned_loss=0.02879, audio_tagging_loss=0.01114, over 3046871.53 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:53:21,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=468780.0, ans=0.025 2023-11-18 23:53:37,351 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:53:48,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=468913.3333333333, ans=0.0 2023-11-18 23:53:52,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=468980.0, ans=0.125 2023-11-18 23:53:52,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-18 23:54:04,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.998e+01 1.003e+02 1.100e+02 1.354e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-18 23:54:06,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=469046.6666666667, ans=0.125 2023-11-18 23:54:13,816 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10250, loss[loss=0.09919, simple_loss=0.1222, pruned_loss=0.02777, audio_tagging_loss=0.01033, over 15284.00 frames. ], tot_loss[loss=0.09678, simple_loss=0.1133, pruned_loss=0.02899, audio_tagging_loss=0.01112, over 3048057.68 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:54:15,951 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2023-11-18 23:54:16,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=469113.3333333333, ans=0.2 2023-11-18 23:54:19,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=469113.3333333333, ans=0.0 2023-11-18 23:54:22,503 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=15.0 2023-11-18 23:54:25,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=469180.0, ans=0.0 2023-11-18 23:54:30,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=469180.0, ans=0.125 2023-11-18 23:54:35,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=469246.6666666667, ans=0.015 2023-11-18 23:54:46,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=469313.3333333333, ans=0.0 2023-11-18 23:54:47,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=469313.3333333333, ans=0.125 2023-11-18 23:54:50,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.25 vs. limit=5.0 2023-11-18 23:54:55,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=469313.3333333333, ans=0.125 2023-11-18 23:55:10,096 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10300, loss[loss=0.0996, simple_loss=0.1158, pruned_loss=0.03021, audio_tagging_loss=0.01148, over 15582.00 frames. ], tot_loss[loss=0.097, simple_loss=0.1133, pruned_loss=0.02916, audio_tagging_loss=0.01117, over 3051680.94 frames. ], batch size: 60, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:55:10,324 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:55:11,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469446.6666666667, ans=0.1 2023-11-18 23:55:18,794 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:55:19,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-18 23:55:27,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=22.5 2023-11-18 23:55:29,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=469513.3333333333, ans=0.2 2023-11-18 23:55:47,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2023-11-18 23:55:56,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 9.104e+01 1.005e+02 1.145e+02 1.607e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 23:56:03,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=469713.3333333333, ans=0.0 2023-11-18 23:56:04,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2023-11-18 23:56:05,046 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10350, loss[loss=0.1097, simple_loss=0.1397, pruned_loss=0.03271, audio_tagging_loss=0.00709, over 15392.00 frames. ], tot_loss[loss=0.09772, simple_loss=0.1141, pruned_loss=0.02947, audio_tagging_loss=0.01121, over 3052163.39 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:56:06,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=469780.0, ans=0.0 2023-11-18 23:56:06,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=469780.0, ans=0.2 2023-11-18 23:56:06,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=469780.0, ans=0.0 2023-11-18 23:56:45,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.19 vs. limit=10.0 2023-11-18 23:56:50,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=470046.6666666667, ans=0.125 2023-11-18 23:56:53,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=470046.6666666667, ans=0.125 2023-11-18 23:57:00,327 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10400, loss[loss=0.1134, simple_loss=0.1288, pruned_loss=0.03807, audio_tagging_loss=0.01091, over 15458.00 frames. ], tot_loss[loss=0.09694, simple_loss=0.1128, pruned_loss=0.02916, audio_tagging_loss=0.01136, over 3050820.70 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:57:02,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=470113.3333333333, ans=0.2 2023-11-18 23:57:35,938 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=470313.3333333333, ans=0.0 2023-11-18 23:57:42,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=470313.3333333333, ans=0.2 2023-11-18 23:57:46,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=470380.0, ans=0.125 2023-11-18 23:57:47,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.610e+01 9.344e+01 1.013e+02 2.407e+02, threshold=1.869e+02, percent-clipped=1.0 2023-11-18 23:57:56,725 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10450, loss[loss=0.1146, simple_loss=0.1349, pruned_loss=0.0352, audio_tagging_loss=0.01193, over 14880.00 frames. ], tot_loss[loss=0.09671, simple_loss=0.1125, pruned_loss=0.02911, audio_tagging_loss=0.01136, over 3048848.80 frames. ], batch size: 54, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:57:59,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=470446.6666666667, ans=0.125 2023-11-18 23:58:04,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.93 vs. limit=15.0 2023-11-18 23:58:30,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=470646.6666666667, ans=0.0 2023-11-18 23:58:35,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=470646.6666666667, ans=0.125 2023-11-18 23:58:40,380 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:58:51,747 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10500, loss[loss=0.07187, simple_loss=0.07983, pruned_loss=0.01726, audio_tagging_loss=0.01469, over 14502.00 frames. ], tot_loss[loss=0.0958, simple_loss=0.1113, pruned_loss=0.02891, audio_tagging_loss=0.01125, over 3048066.79 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:59:00,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=470780.0, ans=0.125 2023-11-18 23:59:21,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2023-11-18 23:59:38,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.568e+01 9.489e+01 1.065e+02 1.523e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-18 23:59:46,868 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10550, loss[loss=0.1068, simple_loss=0.1311, pruned_loss=0.03117, audio_tagging_loss=0.01002, over 15119.00 frames. ], tot_loss[loss=0.09594, simple_loss=0.1121, pruned_loss=0.02884, audio_tagging_loss=0.01106, over 3045724.81 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:00:17,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=471246.6666666667, ans=0.125 2023-11-19 00:00:43,110 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10600, loss[loss=0.06699, simple_loss=0.07622, pruned_loss=0.01723, audio_tagging_loss=0.01166, over 15659.00 frames. ], tot_loss[loss=0.09559, simple_loss=0.1117, pruned_loss=0.02874, audio_tagging_loss=0.01101, over 3045560.73 frames. ], batch size: 60, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:00:45,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.17 vs. limit=22.5 2023-11-19 00:00:47,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2023-11-19 00:01:17,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=471646.6666666667, ans=0.125 2023-11-19 00:01:31,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 9.038e+01 9.665e+01 1.088e+02 1.655e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-19 00:01:38,981 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10650, loss[loss=0.09224, simple_loss=0.1105, pruned_loss=0.0256, audio_tagging_loss=0.01139, over 15357.00 frames. ], tot_loss[loss=0.09546, simple_loss=0.1113, pruned_loss=0.02874, audio_tagging_loss=0.01107, over 3035140.89 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:01:40,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=471780.0, ans=0.025 2023-11-19 00:01:49,802 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471846.6666666667, ans=0.1 2023-11-19 00:01:50,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=471846.6666666667, ans=0.0 2023-11-19 00:01:54,976 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2023-11-19 00:02:18,798 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:02:19,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=471980.0, ans=0.125 2023-11-19 00:02:22,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=472046.6666666667, ans=0.0 2023-11-19 00:02:34,788 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10700, loss[loss=0.1218, simple_loss=0.1377, pruned_loss=0.04123, audio_tagging_loss=0.01171, over 14587.00 frames. ], tot_loss[loss=0.09528, simple_loss=0.1113, pruned_loss=0.02857, audio_tagging_loss=0.01107, over 3038203.78 frames. ], batch size: 54, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:02:35,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=472113.3333333333, ans=0.1 2023-11-19 00:02:38,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=472113.3333333333, ans=0.0 2023-11-19 00:02:39,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=472113.3333333333, ans=0.1 2023-11-19 00:02:43,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=472113.3333333333, ans=0.125 2023-11-19 00:02:54,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=472180.0, ans=0.0 2023-11-19 00:02:55,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=472180.0, ans=0.2 2023-11-19 00:02:59,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=472246.6666666667, ans=0.125 2023-11-19 00:03:06,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=472246.6666666667, ans=0.125 2023-11-19 00:03:22,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.951e+01 9.625e+01 1.080e+02 1.426e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-19 00:03:25,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=472380.0, ans=0.125 2023-11-19 00:03:30,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=472446.6666666667, ans=0.0 2023-11-19 00:03:30,927 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10750, loss[loss=0.108, simple_loss=0.1256, pruned_loss=0.03327, audio_tagging_loss=0.01196, over 15447.00 frames. ], tot_loss[loss=0.09513, simple_loss=0.1112, pruned_loss=0.02848, audio_tagging_loss=0.01105, over 3045777.73 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:03:50,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=472513.3333333333, ans=0.1 2023-11-19 00:04:21,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=472713.3333333333, ans=0.125 2023-11-19 00:04:25,556 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10800, loss[loss=0.07939, simple_loss=0.0995, pruned_loss=0.01929, audio_tagging_loss=0.01035, over 14981.00 frames. ], tot_loss[loss=0.09472, simple_loss=0.111, pruned_loss=0.02818, audio_tagging_loss=0.01104, over 3048617.19 frames. ], batch size: 59, lr: 1.11e-02, grad_scale: 32.0 2023-11-19 00:04:29,507 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:04:53,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472913.3333333333, ans=0.1 2023-11-19 00:05:09,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=473046.6666666667, ans=0.0 2023-11-19 00:05:13,539 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.608e+01 9.354e+01 1.065e+02 1.440e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-19 00:05:20,954 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10850, loss[loss=0.1041, simple_loss=0.1275, pruned_loss=0.03064, audio_tagging_loss=0.009649, over 16483.00 frames. ], tot_loss[loss=0.09496, simple_loss=0.1113, pruned_loss=0.02826, audio_tagging_loss=0.01106, over 3051040.45 frames. ], batch size: 60, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:05:35,539 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:05:58,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.85 vs. limit=15.0 2023-11-19 00:06:05,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=473380.0, ans=0.125 2023-11-19 00:06:11,157 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:06:11,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-19 00:06:14,467 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2023-11-19 00:06:17,537 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10900, loss[loss=0.1018, simple_loss=0.1238, pruned_loss=0.02898, audio_tagging_loss=0.01089, over 16255.00 frames. ], tot_loss[loss=0.09527, simple_loss=0.1118, pruned_loss=0.02837, audio_tagging_loss=0.01102, over 3050676.27 frames. ], batch size: 60, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:06:22,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=473446.6666666667, ans=0.2 2023-11-19 00:06:30,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=473513.3333333333, ans=0.0 2023-11-19 00:06:33,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=473513.3333333333, ans=0.0 2023-11-19 00:06:44,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=473580.0, ans=0.0 2023-11-19 00:07:01,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.60 vs. limit=5.0 2023-11-19 00:07:02,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=473713.3333333333, ans=0.0 2023-11-19 00:07:05,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.580e+01 9.550e+01 1.089e+02 1.595e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-19 00:07:09,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=473713.3333333333, ans=0.2 2023-11-19 00:07:12,537 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 10950, loss[loss=0.1325, simple_loss=0.1463, pruned_loss=0.04849, audio_tagging_loss=0.0109, over 15099.00 frames. ], tot_loss[loss=0.09487, simple_loss=0.1113, pruned_loss=0.02817, audio_tagging_loss=0.01104, over 3057407.04 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:07:14,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=473780.0, ans=0.04949747468305833 2023-11-19 00:07:15,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=473780.0, ans=0.125 2023-11-19 00:07:15,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=473780.0, ans=0.2 2023-11-19 00:07:33,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=473913.3333333333, ans=0.125 2023-11-19 00:07:33,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=473913.3333333333, ans=0.2 2023-11-19 00:07:35,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=473913.3333333333, ans=0.125 2023-11-19 00:07:45,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=473980.0, ans=0.125 2023-11-19 00:07:56,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=474046.6666666667, ans=0.0 2023-11-19 00:07:59,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=474046.6666666667, ans=0.125 2023-11-19 00:08:02,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=474046.6666666667, ans=0.1 2023-11-19 00:08:07,631 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11000, loss[loss=0.1238, simple_loss=0.1482, pruned_loss=0.0422, audio_tagging_loss=0.007519, over 16533.00 frames. ], tot_loss[loss=0.09537, simple_loss=0.112, pruned_loss=0.02839, audio_tagging_loss=0.01098, over 3055110.21 frames. ], batch size: 59, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:08:11,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=474113.3333333333, ans=0.125 2023-11-19 00:08:14,385 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:08:15,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=474113.3333333333, ans=0.125 2023-11-19 00:08:22,787 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=12.0 2023-11-19 00:08:30,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=474246.6666666667, ans=0.125 2023-11-19 00:08:43,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=474313.3333333333, ans=0.125 2023-11-19 00:08:46,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=474313.3333333333, ans=0.0 2023-11-19 00:08:56,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 9.044e+01 9.862e+01 1.100e+02 1.802e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-19 00:09:03,376 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11050, loss[loss=0.1052, simple_loss=0.1218, pruned_loss=0.03, audio_tagging_loss=0.01428, over 15341.00 frames. ], tot_loss[loss=0.09593, simple_loss=0.1128, pruned_loss=0.02845, audio_tagging_loss=0.01106, over 3053522.14 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:09:07,486 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2023-11-19 00:09:11,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=474446.6666666667, ans=0.2 2023-11-19 00:09:21,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=474513.3333333333, ans=0.125 2023-11-19 00:09:26,843 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2023-11-19 00:09:27,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=474580.0, ans=0.125 2023-11-19 00:09:42,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=474646.6666666667, ans=0.125 2023-11-19 00:09:53,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.18 vs. limit=6.0 2023-11-19 00:09:59,044 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11100, loss[loss=0.08281, simple_loss=0.09671, pruned_loss=0.02232, audio_tagging_loss=0.01213, over 15554.00 frames. ], tot_loss[loss=0.09627, simple_loss=0.1131, pruned_loss=0.02851, audio_tagging_loss=0.01122, over 3049229.07 frames. ], batch size: 59, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:10:19,803 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2023-11-19 00:10:20,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=474913.3333333333, ans=0.035 2023-11-19 00:10:28,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=474913.3333333333, ans=0.125 2023-11-19 00:10:30,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=474913.3333333333, ans=0.0 2023-11-19 00:10:47,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.698e+01 9.641e+01 1.040e+02 1.445e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-19 00:10:54,132 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11150, loss[loss=0.1093, simple_loss=0.1333, pruned_loss=0.03064, audio_tagging_loss=0.01197, over 14114.00 frames. ], tot_loss[loss=0.09657, simple_loss=0.1134, pruned_loss=0.02863, audio_tagging_loss=0.01126, over 3049957.60 frames. ], batch size: 53, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:11:00,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=475113.3333333333, ans=0.2 2023-11-19 00:11:00,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=475113.3333333333, ans=0.2 2023-11-19 00:11:14,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=475180.0, ans=0.125 2023-11-19 00:11:24,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475246.6666666667, ans=0.1 2023-11-19 00:11:49,596 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11200, loss[loss=0.0992, simple_loss=0.1175, pruned_loss=0.03148, audio_tagging_loss=0.008975, over 15253.00 frames. ], tot_loss[loss=0.09608, simple_loss=0.1126, pruned_loss=0.02842, audio_tagging_loss=0.01136, over 3051673.35 frames. ], batch size: 58, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:11:52,007 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2023-11-19 00:12:15,703 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2023-11-19 00:12:25,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=475646.6666666667, ans=0.0 2023-11-19 00:12:29,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=475646.6666666667, ans=0.0 2023-11-19 00:12:39,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.628e+01 9.761e+01 1.045e+02 1.473e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-19 00:12:45,836 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11250, loss[loss=0.1015, simple_loss=0.1208, pruned_loss=0.03185, audio_tagging_loss=0.009248, over 15991.00 frames. ], tot_loss[loss=0.09521, simple_loss=0.1114, pruned_loss=0.0282, audio_tagging_loss=0.01129, over 3049197.31 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:13:12,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-11-19 00:13:16,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=475913.3333333333, ans=0.125 2023-11-19 00:13:25,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=475980.0, ans=0.1 2023-11-19 00:13:39,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=476046.6666666667, ans=0.125 2023-11-19 00:13:41,020 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11300, loss[loss=0.09703, simple_loss=0.1122, pruned_loss=0.02885, audio_tagging_loss=0.01206, over 14480.00 frames. ], tot_loss[loss=0.09555, simple_loss=0.112, pruned_loss=0.0284, audio_tagging_loss=0.01112, over 3047641.21 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:13:57,736 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=476180.0, ans=0.2 2023-11-19 00:14:10,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=476246.6666666667, ans=0.125 2023-11-19 00:14:28,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2023-11-19 00:14:29,495 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.711e+01 9.512e+01 1.035e+02 1.315e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 00:14:36,318 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11350, loss[loss=0.0854, simple_loss=0.1067, pruned_loss=0.02168, audio_tagging_loss=0.01035, over 16446.00 frames. ], tot_loss[loss=0.09496, simple_loss=0.1116, pruned_loss=0.02814, audio_tagging_loss=0.01101, over 3041817.90 frames. ], batch size: 61, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:14:36,892 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2023-11-19 00:14:47,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=476513.3333333333, ans=0.125 2023-11-19 00:15:10,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=476646.6666666667, ans=0.125 2023-11-19 00:15:17,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=476646.6666666667, ans=0.125 2023-11-19 00:15:21,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=476713.3333333333, ans=0.5 2023-11-19 00:15:32,155 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-11-19 00:15:32,735 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11400, loss[loss=0.1001, simple_loss=0.1094, pruned_loss=0.03023, audio_tagging_loss=0.01519, over 15852.00 frames. ], tot_loss[loss=0.09464, simple_loss=0.1111, pruned_loss=0.02801, audio_tagging_loss=0.01107, over 3043169.92 frames. ], batch size: 62, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:15:43,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=476846.6666666667, ans=0.125 2023-11-19 00:15:50,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=476846.6666666667, ans=0.2 2023-11-19 00:16:02,273 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:16:15,329 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2023-11-19 00:16:20,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.841e+01 9.746e+01 1.056e+02 1.411e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-19 00:16:21,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2023-11-19 00:16:27,298 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11450, loss[loss=0.05911, simple_loss=0.06938, pruned_loss=0.01188, audio_tagging_loss=0.01254, over 15484.00 frames. ], tot_loss[loss=0.09614, simple_loss=0.1131, pruned_loss=0.02878, audio_tagging_loss=0.01082, over 3046550.37 frames. ], batch size: 62, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:17:15,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=477380.0, ans=0.0 2023-11-19 00:17:16,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=477380.0, ans=0.125 2023-11-19 00:17:18,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=477380.0, ans=0.125 2023-11-19 00:17:22,973 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11500, loss[loss=0.1031, simple_loss=0.1353, pruned_loss=0.02787, audio_tagging_loss=0.007635, over 14895.00 frames. ], tot_loss[loss=0.09586, simple_loss=0.1126, pruned_loss=0.02868, audio_tagging_loss=0.01088, over 3048666.58 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:17:34,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2023-11-19 00:17:36,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.23 vs. limit=22.5 2023-11-19 00:18:00,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=477646.6666666667, ans=0.2 2023-11-19 00:18:11,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.969e+01 9.661e+01 1.076e+02 1.537e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-19 00:18:19,403 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11550, loss[loss=0.1136, simple_loss=0.1292, pruned_loss=0.04033, audio_tagging_loss=0.008685, over 15298.00 frames. ], tot_loss[loss=0.09577, simple_loss=0.1122, pruned_loss=0.02865, audio_tagging_loss=0.011, over 3045178.80 frames. ], batch size: 58, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:18:24,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=477780.0, ans=0.125 2023-11-19 00:18:31,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=477846.6666666667, ans=0.0 2023-11-19 00:18:40,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=477913.3333333333, ans=0.1 2023-11-19 00:18:49,470 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:18:51,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2023-11-19 00:19:01,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2023-11-19 00:19:01,354 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2023-11-19 00:19:02,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=22.5 2023-11-19 00:19:14,378 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11600, loss[loss=0.1197, simple_loss=0.1379, pruned_loss=0.04017, audio_tagging_loss=0.01054, over 16920.00 frames. ], tot_loss[loss=0.09625, simple_loss=0.1127, pruned_loss=0.02889, audio_tagging_loss=0.01102, over 3045375.47 frames. ], batch size: 60, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:19:24,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.05 vs. limit=12.0 2023-11-19 00:19:50,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=478313.3333333333, ans=0.125 2023-11-19 00:19:53,318 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.43 vs. limit=22.5 2023-11-19 00:20:02,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=478380.0, ans=0.1 2023-11-19 00:20:02,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 8.994e+01 9.981e+01 1.100e+02 1.554e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-19 00:20:05,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=478380.0, ans=0.125 2023-11-19 00:20:09,876 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11650, loss[loss=0.07882, simple_loss=0.07822, pruned_loss=0.02601, audio_tagging_loss=0.0137, over 15088.00 frames. ], tot_loss[loss=0.09653, simple_loss=0.1133, pruned_loss=0.02883, audio_tagging_loss=0.01107, over 3042839.68 frames. ], batch size: 59, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:20:25,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=478513.3333333333, ans=0.125 2023-11-19 00:20:32,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=478580.0, ans=0.125 2023-11-19 00:20:46,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=478646.6666666667, ans=0.0 2023-11-19 00:20:57,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=478713.3333333333, ans=0.04949747468305833 2023-11-19 00:21:06,350 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11700, loss[loss=0.1347, simple_loss=0.1628, pruned_loss=0.04429, audio_tagging_loss=0.008988, over 15518.00 frames. ], tot_loss[loss=0.09663, simple_loss=0.1131, pruned_loss=0.02895, audio_tagging_loss=0.01113, over 3042053.88 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:21:13,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.92 vs. limit=22.5 2023-11-19 00:21:20,756 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=478846.6666666667, ans=0.07 2023-11-19 00:21:30,225 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:21:41,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478980.0, ans=0.0 2023-11-19 00:21:48,911 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2023-11-19 00:21:51,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-19 00:21:55,329 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.970e+01 9.668e+01 1.084e+02 1.454e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 00:21:59,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=479046.6666666667, ans=0.125 2023-11-19 00:22:01,684 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11750, loss[loss=0.08976, simple_loss=0.1074, pruned_loss=0.0258, audio_tagging_loss=0.01024, over 15533.00 frames. ], tot_loss[loss=0.0967, simple_loss=0.1131, pruned_loss=0.02902, audio_tagging_loss=0.01112, over 3042638.19 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:22:19,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=479180.0, ans=0.1 2023-11-19 00:22:21,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=479180.0, ans=0.0 2023-11-19 00:22:56,608 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.27 vs. limit=15.0 2023-11-19 00:22:56,856 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11800, loss[loss=0.06112, simple_loss=0.06362, pruned_loss=0.01578, audio_tagging_loss=0.01353, over 14961.00 frames. ], tot_loss[loss=0.0969, simple_loss=0.1131, pruned_loss=0.02916, audio_tagging_loss=0.0112, over 3045709.36 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:22:59,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2023-11-19 00:23:19,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=479580.0, ans=0.0 2023-11-19 00:23:34,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=479646.6666666667, ans=0.2 2023-11-19 00:23:40,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=479713.3333333333, ans=0.125 2023-11-19 00:23:48,144 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.014e+01 9.704e+01 1.070e+02 1.627e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-19 00:23:53,376 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11850, loss[loss=0.1179, simple_loss=0.1392, pruned_loss=0.03732, audio_tagging_loss=0.01097, over 14649.00 frames. ], tot_loss[loss=0.09685, simple_loss=0.1129, pruned_loss=0.02912, audio_tagging_loss=0.01127, over 3043028.29 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:23:56,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=479780.0, ans=0.2 2023-11-19 00:24:12,380 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.90 vs. limit=22.5 2023-11-19 00:24:25,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=479980.0, ans=0.1 2023-11-19 00:24:40,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=480046.6666666667, ans=0.125 2023-11-19 00:24:50,960 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11900, loss[loss=0.09743, simple_loss=0.1177, pruned_loss=0.02963, audio_tagging_loss=0.008946, over 14666.00 frames. ], tot_loss[loss=0.09757, simple_loss=0.1141, pruned_loss=0.02927, audio_tagging_loss=0.01125, over 3047841.10 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:24:52,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=480113.3333333333, ans=0.125 2023-11-19 00:24:55,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=480113.3333333333, ans=0.125 2023-11-19 00:25:10,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=480180.0, ans=0.5 2023-11-19 00:25:27,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480313.3333333333, ans=0.1 2023-11-19 00:25:35,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=480380.0, ans=0.125 2023-11-19 00:25:36,675 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:25:37,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=480380.0, ans=0.0 2023-11-19 00:25:41,566 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.633e+01 9.352e+01 1.050e+02 1.397e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 00:25:42,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=480380.0, ans=0.125 2023-11-19 00:25:42,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=480380.0, ans=0.125 2023-11-19 00:25:43,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=480380.0, ans=0.95 2023-11-19 00:25:45,905 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 11950, loss[loss=0.1024, simple_loss=0.1213, pruned_loss=0.0308, audio_tagging_loss=0.01096, over 14981.00 frames. ], tot_loss[loss=0.09682, simple_loss=0.1132, pruned_loss=0.02885, audio_tagging_loss=0.01136, over 3046317.59 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:25:53,872 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:25:55,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=480446.6666666667, ans=0.125 2023-11-19 00:26:02,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=480513.3333333333, ans=0.125 2023-11-19 00:26:13,517 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=15.0 2023-11-19 00:26:20,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=480646.6666666667, ans=0.0 2023-11-19 00:26:24,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=480646.6666666667, ans=0.125 2023-11-19 00:26:39,766 INFO [train_asr.py:1115] (3/4) Epoch 6, batch 12000, loss[loss=0.1018, simple_loss=0.1168, pruned_loss=0.03117, audio_tagging_loss=0.01223, over 14300.00 frames. ], tot_loss[loss=0.09681, simple_loss=0.113, pruned_loss=0.02893, audio_tagging_loss=0.01139, over 3040149.36 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:26:39,767 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 00:26:53,240 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1974, 4.9423, 4.5298, 4.8927], device='cuda:3') 2023-11-19 00:27:10,548 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.8438, 3.7225, 3.6139, 3.6811], device='cuda:3') 2023-11-19 00:27:12,306 INFO [train_asr.py:1147] (3/4) Epoch 6, validation: loss=0.07011, simple_loss=0.05856, pruned_loss=0.008079, audio_tagging_loss=0.03275, over 4681554.00 frames. 2023-11-19 00:27:12,306 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 00:27:15,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=480780.0, ans=0.0 2023-11-19 00:28:10,671 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 0, loss[loss=0.09779, simple_loss=0.1003, pruned_loss=0.01955, audio_tagging_loss=0.02812, over 15573.00 frames. ], tot_loss[loss=0.09779, simple_loss=0.1003, pruned_loss=0.01955, audio_tagging_loss=0.02812, over 15573.00 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 32.0 2023-11-19 00:28:10,672 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 00:28:37,146 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2769, 4.9761, 4.7583, 5.0863], device='cuda:3') 2023-11-19 00:28:42,238 INFO [train_asr.py:1147] (3/4) Epoch 7, validation: loss=0.06897, simple_loss=0.05854, pruned_loss=0.008004, audio_tagging_loss=0.03169, over 4681554.00 frames. 2023-11-19 00:28:42,239 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 00:28:58,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=480993.3333333333, ans=0.125 2023-11-19 00:29:06,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=481060.0, ans=0.0 2023-11-19 00:29:08,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.969e+01 9.678e+01 1.084e+02 1.742e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-19 00:29:36,960 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 50, loss[loss=0.09178, simple_loss=0.1036, pruned_loss=0.01903, audio_tagging_loss=0.02093, over 14847.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1095, pruned_loss=0.02712, audio_tagging_loss=0.02168, over 684357.50 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:29:50,514 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.69 vs. limit=22.5 2023-11-19 00:29:58,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=481393.3333333333, ans=0.0 2023-11-19 00:30:03,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=481393.3333333333, ans=0.2 2023-11-19 00:30:29,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=481526.6666666667, ans=0.125 2023-11-19 00:30:33,429 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 100, loss[loss=0.1103, simple_loss=0.1225, pruned_loss=0.0304, audio_tagging_loss=0.01859, over 15811.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1109, pruned_loss=0.02768, audio_tagging_loss=0.02091, over 1205541.03 frames. ], batch size: 59, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:30:35,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=481593.3333333333, ans=0.0 2023-11-19 00:30:46,793 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2023-11-19 00:31:01,053 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.882e+01 9.750e+01 1.051e+02 1.477e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 00:31:28,808 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 150, loss[loss=0.1087, simple_loss=0.126, pruned_loss=0.03428, audio_tagging_loss=0.01143, over 14342.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1114, pruned_loss=0.02797, audio_tagging_loss=0.01852, over 1608816.02 frames. ], batch size: 53, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:31:34,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481926.6666666667, ans=0.1 2023-11-19 00:31:35,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=22.5 2023-11-19 00:31:49,176 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:32:15,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=482193.3333333333, ans=0.125 2023-11-19 00:32:25,132 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 200, loss[loss=0.1123, simple_loss=0.1444, pruned_loss=0.03347, audio_tagging_loss=0.006625, over 14620.00 frames. ], tot_loss[loss=0.09954, simple_loss=0.1114, pruned_loss=0.02754, audio_tagging_loss=0.01633, over 1931036.80 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:32:44,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=482326.6666666667, ans=10.0 2023-11-19 00:32:48,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482393.3333333333, ans=0.1 2023-11-19 00:32:50,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482393.3333333333, ans=0.1 2023-11-19 00:32:52,545 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 9.072e+01 1.001e+02 1.087e+02 1.831e+02, threshold=2.002e+02, percent-clipped=0.0 2023-11-19 00:32:53,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=482393.3333333333, ans=0.125 2023-11-19 00:32:57,013 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482460.0, ans=0.1 2023-11-19 00:33:21,288 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 250, loss[loss=0.09622, simple_loss=0.1084, pruned_loss=0.03225, audio_tagging_loss=0.009794, over 14416.00 frames. ], tot_loss[loss=0.09964, simple_loss=0.1132, pruned_loss=0.02832, audio_tagging_loss=0.0147, over 2173246.93 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:33:45,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=482726.6666666667, ans=0.0 2023-11-19 00:33:49,203 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-19 00:33:52,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=482726.6666666667, ans=0.125 2023-11-19 00:33:55,493 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-19 00:34:16,417 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 300, loss[loss=0.07817, simple_loss=0.09157, pruned_loss=0.02263, audio_tagging_loss=0.009756, over 14945.00 frames. ], tot_loss[loss=0.09739, simple_loss=0.1114, pruned_loss=0.028, audio_tagging_loss=0.0137, over 2361379.18 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:34:16,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=482926.6666666667, ans=0.0 2023-11-19 00:34:33,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=482993.3333333333, ans=0.125 2023-11-19 00:34:44,700 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.903e+01 9.554e+01 1.061e+02 1.704e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-19 00:35:05,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.65 vs. limit=10.0 2023-11-19 00:35:06,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=483193.3333333333, ans=0.125 2023-11-19 00:35:12,357 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 350, loss[loss=0.1034, simple_loss=0.1271, pruned_loss=0.02932, audio_tagging_loss=0.01055, over 15065.00 frames. ], tot_loss[loss=0.09704, simple_loss=0.1123, pruned_loss=0.02807, audio_tagging_loss=0.01284, over 2510402.67 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:35:53,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=483460.0, ans=0.015 2023-11-19 00:36:07,561 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 400, loss[loss=0.07743, simple_loss=0.09899, pruned_loss=0.02014, audio_tagging_loss=0.007801, over 14362.00 frames. ], tot_loss[loss=0.09616, simple_loss=0.1119, pruned_loss=0.02796, audio_tagging_loss=0.01228, over 2632323.33 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:36:12,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=22.5 2023-11-19 00:36:13,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.22 vs. limit=15.0 2023-11-19 00:36:21,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=483660.0, ans=0.0 2023-11-19 00:36:34,359 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.424e+01 8.614e+01 9.359e+01 1.038e+02 1.564e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 00:36:37,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=483726.6666666667, ans=0.125 2023-11-19 00:36:39,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2023-11-19 00:36:49,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483793.3333333333, ans=0.1 2023-11-19 00:36:53,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=483860.0, ans=0.0 2023-11-19 00:37:01,742 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 450, loss[loss=0.06317, simple_loss=0.07176, pruned_loss=0.01566, audio_tagging_loss=0.01163, over 14718.00 frames. ], tot_loss[loss=0.09565, simple_loss=0.1118, pruned_loss=0.02787, audio_tagging_loss=0.01189, over 2728803.31 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:37:07,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=483926.6666666667, ans=0.125 2023-11-19 00:37:16,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=483993.3333333333, ans=0.0 2023-11-19 00:37:17,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=483993.3333333333, ans=0.0 2023-11-19 00:37:31,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=484060.0, ans=0.125 2023-11-19 00:37:32,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=484060.0, ans=0.2 2023-11-19 00:37:40,256 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.788e-01 2023-11-19 00:37:42,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=484126.6666666667, ans=0.125 2023-11-19 00:37:44,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=484126.6666666667, ans=0.125 2023-11-19 00:37:48,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=484193.3333333333, ans=0.125 2023-11-19 00:37:49,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484193.3333333333, ans=0.1 2023-11-19 00:37:54,885 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=484193.3333333333, ans=0.125 2023-11-19 00:37:57,258 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 500, loss[loss=0.09954, simple_loss=0.1199, pruned_loss=0.02981, audio_tagging_loss=0.009762, over 15585.00 frames. ], tot_loss[loss=0.09508, simple_loss=0.111, pruned_loss=0.02783, audio_tagging_loss=0.01176, over 2804104.80 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:38:01,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484260.0, ans=0.1 2023-11-19 00:38:05,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=484260.0, ans=0.0 2023-11-19 00:38:21,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484393.3333333333, ans=0.1 2023-11-19 00:38:24,792 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.485e+01 9.298e+01 1.059e+02 1.299e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 00:38:25,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.22 vs. limit=10.0 2023-11-19 00:38:31,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484460.0, ans=0.1 2023-11-19 00:38:36,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=484460.0, ans=0.5 2023-11-19 00:38:43,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=484526.6666666667, ans=0.0 2023-11-19 00:38:45,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=484526.6666666667, ans=0.125 2023-11-19 00:38:50,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=484526.6666666667, ans=0.0 2023-11-19 00:38:51,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=484593.3333333333, ans=0.0 2023-11-19 00:38:52,406 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 550, loss[loss=0.0858, simple_loss=0.09423, pruned_loss=0.02526, audio_tagging_loss=0.01343, over 15400.00 frames. ], tot_loss[loss=0.09558, simple_loss=0.1116, pruned_loss=0.02818, audio_tagging_loss=0.01159, over 2849196.44 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:38:57,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=484593.3333333333, ans=0.2 2023-11-19 00:39:07,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=484660.0, ans=0.125 2023-11-19 00:39:30,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-19 00:39:32,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484793.3333333333, ans=0.1 2023-11-19 00:39:38,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=484860.0, ans=0.125 2023-11-19 00:39:42,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2023-11-19 00:39:43,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=484860.0, ans=0.125 2023-11-19 00:39:44,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=484860.0, ans=0.0 2023-11-19 00:39:46,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.84 vs. limit=10.0 2023-11-19 00:39:48,097 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 600, loss[loss=0.07343, simple_loss=0.08148, pruned_loss=0.02086, audio_tagging_loss=0.01183, over 15644.00 frames. ], tot_loss[loss=0.0958, simple_loss=0.1121, pruned_loss=0.02822, audio_tagging_loss=0.01152, over 2891557.21 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:40:07,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=484993.3333333333, ans=0.125 2023-11-19 00:40:15,507 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.943e+01 9.833e+01 1.134e+02 1.508e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-19 00:40:17,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=485060.0, ans=0.0 2023-11-19 00:40:18,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=485060.0, ans=0.0 2023-11-19 00:40:21,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=485126.6666666667, ans=0.0 2023-11-19 00:40:41,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=485260.0, ans=0.1 2023-11-19 00:40:42,651 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 650, loss[loss=0.1043, simple_loss=0.128, pruned_loss=0.02954, audio_tagging_loss=0.01079, over 16180.00 frames. ], tot_loss[loss=0.09626, simple_loss=0.1131, pruned_loss=0.02838, audio_tagging_loss=0.01133, over 2930872.95 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:40:45,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=485260.0, ans=0.2 2023-11-19 00:41:13,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=485393.3333333333, ans=0.125 2023-11-19 00:41:31,535 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-19 00:41:38,281 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 700, loss[loss=0.06836, simple_loss=0.07291, pruned_loss=0.01892, audio_tagging_loss=0.01299, over 13115.00 frames. ], tot_loss[loss=0.09583, simple_loss=0.1126, pruned_loss=0.02822, audio_tagging_loss=0.01133, over 2956557.12 frames. ], batch size: 54, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:41:48,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=485660.0, ans=0.125 2023-11-19 00:41:50,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=485660.0, ans=0.125 2023-11-19 00:42:02,480 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:42:02,997 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.72 vs. limit=22.5 2023-11-19 00:42:06,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.220e+01 8.517e+01 9.340e+01 1.042e+02 1.556e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 00:42:09,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2023-11-19 00:42:23,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=485860.0, ans=0.2 2023-11-19 00:42:33,685 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 750, loss[loss=0.06075, simple_loss=0.06573, pruned_loss=0.01567, audio_tagging_loss=0.01222, over 14715.00 frames. ], tot_loss[loss=0.09517, simple_loss=0.1117, pruned_loss=0.02807, audio_tagging_loss=0.01128, over 2981709.95 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:42:39,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.37 vs. limit=22.5 2023-11-19 00:42:40,661 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2023-11-19 00:43:20,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=486193.3333333333, ans=0.125 2023-11-19 00:43:23,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.53 vs. limit=10.0 2023-11-19 00:43:28,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=486260.0, ans=0.125 2023-11-19 00:43:28,902 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 800, loss[loss=0.089, simple_loss=0.1162, pruned_loss=0.02101, audio_tagging_loss=0.009875, over 15741.00 frames. ], tot_loss[loss=0.09605, simple_loss=0.1126, pruned_loss=0.02843, audio_tagging_loss=0.01129, over 2991427.46 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:43:32,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=486260.0, ans=0.0 2023-11-19 00:43:42,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=486326.6666666667, ans=0.125 2023-11-19 00:43:53,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=486393.3333333333, ans=0.125 2023-11-19 00:43:58,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 8.961e+01 9.604e+01 1.088e+02 1.734e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-19 00:44:01,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=486393.3333333333, ans=0.0 2023-11-19 00:44:05,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=486460.0, ans=0.125 2023-11-19 00:44:08,406 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=486460.0, ans=0.125 2023-11-19 00:44:24,838 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 850, loss[loss=0.1133, simple_loss=0.1273, pruned_loss=0.03851, audio_tagging_loss=0.0111, over 14626.00 frames. ], tot_loss[loss=0.09583, simple_loss=0.1122, pruned_loss=0.02836, audio_tagging_loss=0.01135, over 3005651.18 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:44:27,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=486593.3333333333, ans=0.125 2023-11-19 00:44:46,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=486660.0, ans=0.0 2023-11-19 00:44:52,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=486726.6666666667, ans=0.09899494936611666 2023-11-19 00:44:55,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=486726.6666666667, ans=0.0 2023-11-19 00:45:01,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=486793.3333333333, ans=0.0 2023-11-19 00:45:03,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=486793.3333333333, ans=0.025 2023-11-19 00:45:07,118 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2023-11-19 00:45:13,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=486860.0, ans=0.125 2023-11-19 00:45:21,333 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 900, loss[loss=0.0826, simple_loss=0.08382, pruned_loss=0.02702, audio_tagging_loss=0.01367, over 15735.00 frames. ], tot_loss[loss=0.09621, simple_loss=0.1126, pruned_loss=0.02852, audio_tagging_loss=0.01136, over 3024177.66 frames. ], batch size: 63, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:45:26,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=486926.6666666667, ans=0.0 2023-11-19 00:45:34,480 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2023-11-19 00:45:48,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=487060.0, ans=0.0 2023-11-19 00:45:49,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.581e+01 9.444e+01 1.025e+02 1.382e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-19 00:46:11,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=487193.3333333333, ans=0.125 2023-11-19 00:46:16,167 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 950, loss[loss=0.08476, simple_loss=0.1052, pruned_loss=0.02168, audio_tagging_loss=0.0105, over 15633.00 frames. ], tot_loss[loss=0.096, simple_loss=0.1129, pruned_loss=0.02834, audio_tagging_loss=0.0112, over 3028331.63 frames. ], batch size: 61, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:46:28,733 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-11-19 00:46:34,728 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=12.0 2023-11-19 00:46:48,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=487393.3333333333, ans=0.0 2023-11-19 00:47:09,045 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.35 vs. limit=15.0 2023-11-19 00:47:11,645 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1000, loss[loss=0.08832, simple_loss=0.1012, pruned_loss=0.02627, audio_tagging_loss=0.01143, over 14798.00 frames. ], tot_loss[loss=0.09453, simple_loss=0.1114, pruned_loss=0.02786, audio_tagging_loss=0.01098, over 3034795.22 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:47:13,319 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.81 vs. limit=10.0 2023-11-19 00:47:25,474 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=487660.0, ans=0.035 2023-11-19 00:47:35,289 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:47:41,637 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.644e+01 9.195e+01 1.009e+02 1.438e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 00:47:43,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2023-11-19 00:47:56,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.15 vs. limit=10.0 2023-11-19 00:47:59,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=487860.0, ans=0.125 2023-11-19 00:48:01,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=487860.0, ans=0.125 2023-11-19 00:48:07,471 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1050, loss[loss=0.06971, simple_loss=0.08356, pruned_loss=0.01741, audio_tagging_loss=0.01052, over 14474.00 frames. ], tot_loss[loss=0.09485, simple_loss=0.1118, pruned_loss=0.02807, audio_tagging_loss=0.01089, over 3033233.43 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:48:09,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=487926.6666666667, ans=0.0 2023-11-19 00:48:17,294 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.04 vs. limit=22.5 2023-11-19 00:48:30,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=488060.0, ans=0.125 2023-11-19 00:48:33,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=488060.0, ans=0.2 2023-11-19 00:48:35,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=488060.0, ans=0.0 2023-11-19 00:48:36,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=488060.0, ans=0.125 2023-11-19 00:48:38,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-19 00:48:58,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=488193.3333333333, ans=0.125 2023-11-19 00:48:59,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=12.0 2023-11-19 00:49:03,267 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1100, loss[loss=0.08068, simple_loss=0.1024, pruned_loss=0.01823, audio_tagging_loss=0.01124, over 15864.00 frames. ], tot_loss[loss=0.09408, simple_loss=0.1109, pruned_loss=0.02778, audio_tagging_loss=0.01086, over 3032354.54 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:49:03,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=488260.0, ans=0.125 2023-11-19 00:49:06,403 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:49:13,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=488326.6666666667, ans=0.125 2023-11-19 00:49:16,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=488326.6666666667, ans=0.125 2023-11-19 00:49:22,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=488326.6666666667, ans=0.0 2023-11-19 00:49:26,569 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.51 vs. limit=22.5 2023-11-19 00:49:33,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.821e+01 9.518e+01 1.052e+02 1.526e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 00:49:44,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=488460.0, ans=0.125 2023-11-19 00:49:58,981 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1150, loss[loss=0.1294, simple_loss=0.1592, pruned_loss=0.04112, audio_tagging_loss=0.008663, over 15616.00 frames. ], tot_loss[loss=0.09416, simple_loss=0.1111, pruned_loss=0.02778, audio_tagging_loss=0.0108, over 3033053.47 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:50:17,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=488660.0, ans=0.1 2023-11-19 00:50:24,804 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=488726.6666666667, ans=0.2 2023-11-19 00:50:32,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=488793.3333333333, ans=0.1 2023-11-19 00:50:42,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=488860.0, ans=0.0 2023-11-19 00:50:46,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=488860.0, ans=0.125 2023-11-19 00:50:55,540 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1200, loss[loss=0.08283, simple_loss=0.08981, pruned_loss=0.02608, audio_tagging_loss=0.01185, over 14865.00 frames. ], tot_loss[loss=0.0935, simple_loss=0.1106, pruned_loss=0.02754, audio_tagging_loss=0.01066, over 3035572.47 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:51:25,295 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.775e+01 9.458e+01 1.050e+02 1.338e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-19 00:51:28,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=489126.6666666667, ans=0.07 2023-11-19 00:51:45,978 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2023-11-19 00:51:50,845 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1250, loss[loss=0.1067, simple_loss=0.1267, pruned_loss=0.03407, audio_tagging_loss=0.009233, over 14226.00 frames. ], tot_loss[loss=0.0945, simple_loss=0.1119, pruned_loss=0.02793, audio_tagging_loss=0.01062, over 3039495.75 frames. ], batch size: 52, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:51:53,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.71 vs. limit=15.0 2023-11-19 00:52:17,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=489393.3333333333, ans=0.125 2023-11-19 00:52:21,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=489393.3333333333, ans=0.07 2023-11-19 00:52:47,232 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1300, loss[loss=0.0902, simple_loss=0.1009, pruned_loss=0.02944, audio_tagging_loss=0.01032, over 15744.00 frames. ], tot_loss[loss=0.09418, simple_loss=0.1116, pruned_loss=0.0278, audio_tagging_loss=0.01059, over 3042740.64 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:52:53,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489593.3333333333, ans=0.1 2023-11-19 00:53:09,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-19 00:53:17,089 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.399e+01 8.502e+01 9.491e+01 1.040e+02 1.421e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-19 00:53:27,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=489793.3333333333, ans=0.2 2023-11-19 00:53:43,612 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1350, loss[loss=0.1233, simple_loss=0.1572, pruned_loss=0.03545, audio_tagging_loss=0.009251, over 16325.00 frames. ], tot_loss[loss=0.09494, simple_loss=0.1124, pruned_loss=0.02806, audio_tagging_loss=0.01068, over 3045315.92 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:53:46,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=489926.6666666667, ans=0.0 2023-11-19 00:53:49,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=489926.6666666667, ans=0.2 2023-11-19 00:54:10,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2023-11-19 00:54:13,429 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:54:16,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=490126.6666666667, ans=0.125 2023-11-19 00:54:18,738 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:54:21,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=490126.6666666667, ans=0.2 2023-11-19 00:54:24,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=490126.6666666667, ans=0.0 2023-11-19 00:54:24,944 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:54:28,375 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:54:38,694 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1400, loss[loss=0.1118, simple_loss=0.1241, pruned_loss=0.03881, audio_tagging_loss=0.01089, over 15135.00 frames. ], tot_loss[loss=0.09474, simple_loss=0.112, pruned_loss=0.02798, audio_tagging_loss=0.01075, over 3041984.29 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:54:39,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=490260.0, ans=0.125 2023-11-19 00:55:09,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.650e+01 9.368e+01 1.053e+02 1.666e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 00:55:21,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=490460.0, ans=0.2 2023-11-19 00:55:35,058 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1450, loss[loss=0.1044, simple_loss=0.1326, pruned_loss=0.02911, audio_tagging_loss=0.009016, over 15512.00 frames. ], tot_loss[loss=0.09487, simple_loss=0.1121, pruned_loss=0.02805, audio_tagging_loss=0.01079, over 3043097.49 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:55:43,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=490593.3333333333, ans=0.0 2023-11-19 00:55:47,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=490660.0, ans=0.07 2023-11-19 00:55:51,408 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:55:54,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=490660.0, ans=0.125 2023-11-19 00:56:12,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.84 vs. limit=10.0 2023-11-19 00:56:23,886 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2023-11-19 00:56:30,785 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1500, loss[loss=0.0976, simple_loss=0.117, pruned_loss=0.02684, audio_tagging_loss=0.01224, over 15863.00 frames. ], tot_loss[loss=0.0947, simple_loss=0.1117, pruned_loss=0.02791, audio_tagging_loss=0.01092, over 3044353.52 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:56:31,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=490926.6666666667, ans=0.015 2023-11-19 00:56:35,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=490926.6666666667, ans=0.0 2023-11-19 00:56:58,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=491060.0, ans=0.0 2023-11-19 00:56:59,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=22.5 2023-11-19 00:57:00,322 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.708e+01 9.682e+01 1.052e+02 1.356e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-19 00:57:04,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=491126.6666666667, ans=0.5 2023-11-19 00:57:13,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=491126.6666666667, ans=0.0 2023-11-19 00:57:16,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=491193.3333333333, ans=0.125 2023-11-19 00:57:25,674 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1550, loss[loss=0.108, simple_loss=0.1392, pruned_loss=0.03023, audio_tagging_loss=0.008132, over 15809.00 frames. ], tot_loss[loss=0.0941, simple_loss=0.1106, pruned_loss=0.02769, audio_tagging_loss=0.0111, over 3037608.25 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:57:28,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=491260.0, ans=0.125 2023-11-19 00:57:30,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=491260.0, ans=0.125 2023-11-19 00:57:38,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=491326.6666666667, ans=0.0 2023-11-19 00:57:49,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=491393.3333333333, ans=0.04949747468305833 2023-11-19 00:57:56,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=491393.3333333333, ans=0.125 2023-11-19 00:58:15,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=491526.6666666667, ans=0.125 2023-11-19 00:58:20,576 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1600, loss[loss=0.08779, simple_loss=0.1, pruned_loss=0.02322, audio_tagging_loss=0.01455, over 16298.00 frames. ], tot_loss[loss=0.09393, simple_loss=0.1102, pruned_loss=0.02757, audio_tagging_loss=0.01127, over 3039639.57 frames. ], batch size: 61, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:58:29,348 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-19 00:58:37,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2023-11-19 00:58:41,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=491660.0, ans=0.125 2023-11-19 00:58:42,765 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:58:44,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=491726.6666666667, ans=0.0 2023-11-19 00:58:50,951 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.782e+01 9.645e+01 1.086e+02 1.733e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 00:59:10,759 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=15.0 2023-11-19 00:59:17,058 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1650, loss[loss=0.07011, simple_loss=0.08235, pruned_loss=0.01834, audio_tagging_loss=0.0106, over 14051.00 frames. ], tot_loss[loss=0.09442, simple_loss=0.1108, pruned_loss=0.02784, audio_tagging_loss=0.01118, over 3037090.99 frames. ], batch size: 53, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:59:21,462 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=22.5 2023-11-19 00:59:28,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-11-19 00:59:36,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=491993.3333333333, ans=0.0 2023-11-19 01:00:12,744 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1700, loss[loss=0.09607, simple_loss=0.1082, pruned_loss=0.03063, audio_tagging_loss=0.01133, over 15812.00 frames. ], tot_loss[loss=0.09415, simple_loss=0.1106, pruned_loss=0.02762, audio_tagging_loss=0.01125, over 3043102.03 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 01:00:24,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=492326.6666666667, ans=0.2 2023-11-19 01:00:43,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.546e+01 9.504e+01 1.048e+02 1.501e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-19 01:00:57,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=492526.6666666667, ans=0.2 2023-11-19 01:01:06,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=492526.6666666667, ans=0.0 2023-11-19 01:01:08,023 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1750, loss[loss=0.09153, simple_loss=0.09564, pruned_loss=0.03092, audio_tagging_loss=0.01278, over 15970.00 frames. ], tot_loss[loss=0.09386, simple_loss=0.1102, pruned_loss=0.02752, audio_tagging_loss=0.01125, over 3044111.97 frames. ], batch size: 63, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 01:01:17,537 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.29 vs. limit=10.0 2023-11-19 01:01:22,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2023-11-19 01:01:36,042 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:01:44,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=492793.3333333333, ans=0.125 2023-11-19 01:01:45,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=492793.3333333333, ans=0.125 2023-11-19 01:02:04,397 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1800, loss[loss=0.09009, simple_loss=0.1012, pruned_loss=0.02744, audio_tagging_loss=0.01206, over 15214.00 frames. ], tot_loss[loss=0.09415, simple_loss=0.1108, pruned_loss=0.02761, audio_tagging_loss=0.01115, over 3039882.29 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:02:17,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492993.3333333333, ans=0.1 2023-11-19 01:02:18,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492993.3333333333, ans=0.1 2023-11-19 01:02:20,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=492993.3333333333, ans=0.125 2023-11-19 01:02:24,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=492993.3333333333, ans=0.04949747468305833 2023-11-19 01:02:34,107 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.598e+01 9.390e+01 1.040e+02 1.619e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 01:02:35,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=493060.0, ans=0.2 2023-11-19 01:03:00,480 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1850, loss[loss=0.08698, simple_loss=0.1033, pruned_loss=0.02494, audio_tagging_loss=0.01041, over 15502.00 frames. ], tot_loss[loss=0.09392, simple_loss=0.1107, pruned_loss=0.02742, audio_tagging_loss=0.01117, over 3049437.22 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:03:13,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=493326.6666666667, ans=0.015 2023-11-19 01:03:23,582 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:03:40,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=493460.0, ans=0.0 2023-11-19 01:03:55,761 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1900, loss[loss=0.09224, simple_loss=0.1093, pruned_loss=0.02546, audio_tagging_loss=0.01212, over 15069.00 frames. ], tot_loss[loss=0.09344, simple_loss=0.1104, pruned_loss=0.02717, audio_tagging_loss=0.01108, over 3050812.56 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:04:13,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=493660.0, ans=0.2 2023-11-19 01:04:25,810 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.519e+01 9.193e+01 1.005e+02 1.310e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 01:04:33,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=493793.3333333333, ans=0.125 2023-11-19 01:04:49,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=493926.6666666667, ans=0.0 2023-11-19 01:04:50,848 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 1950, loss[loss=0.1053, simple_loss=0.1319, pruned_loss=0.0296, audio_tagging_loss=0.00972, over 14728.00 frames. ], tot_loss[loss=0.09317, simple_loss=0.1099, pruned_loss=0.02708, audio_tagging_loss=0.01112, over 3044470.80 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:05:19,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2023-11-19 01:05:20,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494060.0, ans=0.1 2023-11-19 01:05:29,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=494126.6666666667, ans=0.125 2023-11-19 01:05:35,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2023-11-19 01:05:47,464 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2000, loss[loss=0.08443, simple_loss=0.08932, pruned_loss=0.02673, audio_tagging_loss=0.01304, over 13864.00 frames. ], tot_loss[loss=0.09314, simple_loss=0.1098, pruned_loss=0.02721, audio_tagging_loss=0.01104, over 3043618.52 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:05:50,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=494260.0, ans=0.125 2023-11-19 01:05:54,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=494260.0, ans=0.0 2023-11-19 01:06:03,530 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=494326.6666666667, ans=0.125 2023-11-19 01:06:05,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=494326.6666666667, ans=0.125 2023-11-19 01:06:16,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.706e+01 9.238e+01 1.036e+02 1.404e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 01:06:18,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=494393.3333333333, ans=0.125 2023-11-19 01:06:34,317 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494526.6666666667, ans=0.1 2023-11-19 01:06:35,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=494526.6666666667, ans=0.0 2023-11-19 01:06:38,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=494526.6666666667, ans=0.125 2023-11-19 01:06:42,718 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2050, loss[loss=0.0878, simple_loss=0.09256, pruned_loss=0.0273, audio_tagging_loss=0.01421, over 16193.00 frames. ], tot_loss[loss=0.09369, simple_loss=0.1104, pruned_loss=0.02745, audio_tagging_loss=0.01104, over 3041054.70 frames. ], batch size: 62, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:06:45,416 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-19 01:06:48,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.44 vs. limit=10.0 2023-11-19 01:06:49,291 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:07:01,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=494660.0, ans=0.0 2023-11-19 01:07:03,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=494660.0, ans=0.0 2023-11-19 01:07:20,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=494793.3333333333, ans=0.125 2023-11-19 01:07:34,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=494860.0, ans=0.125 2023-11-19 01:07:39,055 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2100, loss[loss=0.07737, simple_loss=0.09665, pruned_loss=0.01804, audio_tagging_loss=0.01099, over 15827.00 frames. ], tot_loss[loss=0.0938, simple_loss=0.1109, pruned_loss=0.02746, audio_tagging_loss=0.0109, over 3038016.11 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:07:49,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=494926.6666666667, ans=0.0 2023-11-19 01:08:01,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=495060.0, ans=10.0 2023-11-19 01:08:03,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=495060.0, ans=0.0 2023-11-19 01:08:05,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=495060.0, ans=0.2 2023-11-19 01:08:09,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.837e+01 9.503e+01 1.029e+02 1.417e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-19 01:08:15,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=495126.6666666667, ans=0.125 2023-11-19 01:08:26,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=495193.3333333333, ans=0.125 2023-11-19 01:08:32,668 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=495193.3333333333, ans=0.125 2023-11-19 01:08:35,531 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2150, loss[loss=0.08132, simple_loss=0.08749, pruned_loss=0.02526, audio_tagging_loss=0.01232, over 14820.00 frames. ], tot_loss[loss=0.09368, simple_loss=0.1107, pruned_loss=0.0275, audio_tagging_loss=0.01086, over 3042389.43 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:08:53,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495326.6666666667, ans=0.1 2023-11-19 01:08:54,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=495326.6666666667, ans=0.0 2023-11-19 01:09:03,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=495393.3333333333, ans=0.125 2023-11-19 01:09:10,120 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:09:12,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=495460.0, ans=0.1 2023-11-19 01:09:16,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.95 vs. limit=22.5 2023-11-19 01:09:27,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495526.6666666667, ans=0.1 2023-11-19 01:09:29,487 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=495526.6666666667, ans=0.0 2023-11-19 01:09:30,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=495593.3333333333, ans=0.035 2023-11-19 01:09:31,323 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2200, loss[loss=0.0849, simple_loss=0.1098, pruned_loss=0.02, audio_tagging_loss=0.009989, over 14869.00 frames. ], tot_loss[loss=0.09461, simple_loss=0.1119, pruned_loss=0.02782, audio_tagging_loss=0.01086, over 3046906.74 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:09:34,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-11-19 01:09:39,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=495593.3333333333, ans=0.125 2023-11-19 01:09:45,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=495660.0, ans=0.0 2023-11-19 01:09:57,524 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.40 vs. limit=10.0 2023-11-19 01:10:02,324 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.544e+01 9.673e+01 1.053e+02 1.527e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-19 01:10:05,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495793.3333333333, ans=0.1 2023-11-19 01:10:06,801 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.032e-02 2023-11-19 01:10:18,250 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.43 vs. limit=22.5 2023-11-19 01:10:26,496 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2250, loss[loss=0.05708, simple_loss=0.06065, pruned_loss=0.01575, audio_tagging_loss=0.01101, over 15628.00 frames. ], tot_loss[loss=0.09349, simple_loss=0.1102, pruned_loss=0.02746, audio_tagging_loss=0.01094, over 3042235.62 frames. ], batch size: 63, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:10:26,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=495926.6666666667, ans=0.125 2023-11-19 01:10:39,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=495993.3333333333, ans=0.0 2023-11-19 01:10:58,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=496060.0, ans=0.125 2023-11-19 01:11:07,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=496126.6666666667, ans=0.2 2023-11-19 01:11:23,084 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2300, loss[loss=0.1184, simple_loss=0.1387, pruned_loss=0.03798, audio_tagging_loss=0.01111, over 15189.00 frames. ], tot_loss[loss=0.09443, simple_loss=0.1114, pruned_loss=0.02782, audio_tagging_loss=0.01089, over 3048360.71 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:11:41,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=496326.6666666667, ans=0.0 2023-11-19 01:11:43,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=496393.3333333333, ans=0.1 2023-11-19 01:11:53,801 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 9.064e+01 9.790e+01 1.107e+02 1.454e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-19 01:12:01,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=496460.0, ans=0.0 2023-11-19 01:12:07,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=496526.6666666667, ans=0.07 2023-11-19 01:12:12,946 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:12:13,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496526.6666666667, ans=0.1 2023-11-19 01:12:18,212 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2350, loss[loss=0.08268, simple_loss=0.08918, pruned_loss=0.02423, audio_tagging_loss=0.01386, over 15456.00 frames. ], tot_loss[loss=0.09376, simple_loss=0.1103, pruned_loss=0.02752, audio_tagging_loss=0.0111, over 3050641.96 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:12:33,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=496660.0, ans=0.125 2023-11-19 01:13:05,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=496860.0, ans=0.09899494936611666 2023-11-19 01:13:10,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-19 01:13:14,597 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2400, loss[loss=0.09191, simple_loss=0.1121, pruned_loss=0.02681, audio_tagging_loss=0.009029, over 14974.00 frames. ], tot_loss[loss=0.09498, simple_loss=0.112, pruned_loss=0.02786, audio_tagging_loss=0.01114, over 3055610.94 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:13:25,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=496993.3333333333, ans=0.125 2023-11-19 01:13:45,237 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.739e+01 9.521e+01 1.008e+02 1.350e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 01:14:02,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=497193.3333333333, ans=0.125 2023-11-19 01:14:10,594 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2450, loss[loss=0.08611, simple_loss=0.1032, pruned_loss=0.02054, audio_tagging_loss=0.01396, over 15755.00 frames. ], tot_loss[loss=0.09471, simple_loss=0.1115, pruned_loss=0.02772, audio_tagging_loss=0.01123, over 3056819.49 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:14:44,035 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=497460.0, ans=0.125 2023-11-19 01:14:44,952 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:14:49,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=497460.0, ans=0.025 2023-11-19 01:14:49,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=497460.0, ans=0.0 2023-11-19 01:14:51,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=497460.0, ans=0.1 2023-11-19 01:15:06,178 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2500, loss[loss=0.09165, simple_loss=0.1124, pruned_loss=0.02604, audio_tagging_loss=0.009398, over 15904.00 frames. ], tot_loss[loss=0.09512, simple_loss=0.1122, pruned_loss=0.02796, audio_tagging_loss=0.01108, over 3058338.81 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:15:14,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.35 vs. limit=10.0 2023-11-19 01:15:15,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=497593.3333333333, ans=0.2 2023-11-19 01:15:20,816 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:15:37,853 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 8.612e+01 9.372e+01 1.003e+02 1.252e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 01:15:59,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.74 vs. limit=10.0 2023-11-19 01:16:01,809 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2550, loss[loss=0.07716, simple_loss=0.09042, pruned_loss=0.01963, audio_tagging_loss=0.01231, over 14712.00 frames. ], tot_loss[loss=0.09501, simple_loss=0.1121, pruned_loss=0.02792, audio_tagging_loss=0.01104, over 3048584.57 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:16:07,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-19 01:16:14,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=497993.3333333333, ans=0.125 2023-11-19 01:16:57,739 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2600, loss[loss=0.054, simple_loss=0.0522, pruned_loss=0.01506, audio_tagging_loss=0.01284, over 16257.00 frames. ], tot_loss[loss=0.0942, simple_loss=0.1113, pruned_loss=0.0276, audio_tagging_loss=0.01094, over 3052459.09 frames. ], batch size: 65, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:17:28,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.310e+01 9.022e+01 9.947e+01 2.048e+02, threshold=1.804e+02, percent-clipped=1.0 2023-11-19 01:17:28,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=498393.3333333333, ans=0.125 2023-11-19 01:17:42,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=498526.6666666667, ans=0.125 2023-11-19 01:17:53,113 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2650, loss[loss=0.08589, simple_loss=0.09567, pruned_loss=0.02706, audio_tagging_loss=0.011, over 14828.00 frames. ], tot_loss[loss=0.0948, simple_loss=0.1122, pruned_loss=0.02787, audio_tagging_loss=0.01082, over 3054528.12 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:17:59,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=498593.3333333333, ans=0.0 2023-11-19 01:18:31,661 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:18:33,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=498793.3333333333, ans=0.0 2023-11-19 01:18:39,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=498860.0, ans=0.2 2023-11-19 01:18:41,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=498860.0, ans=0.0 2023-11-19 01:18:48,520 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2700, loss[loss=0.09137, simple_loss=0.1057, pruned_loss=0.02771, audio_tagging_loss=0.01081, over 15886.00 frames. ], tot_loss[loss=0.09402, simple_loss=0.1114, pruned_loss=0.0276, audio_tagging_loss=0.01075, over 3056720.01 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:18:48,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=498926.6666666667, ans=0.0 2023-11-19 01:19:08,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498993.3333333333, ans=0.1 2023-11-19 01:19:18,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=499060.0, ans=0.125 2023-11-19 01:19:20,348 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.454e+01 8.998e+01 9.749e+01 1.436e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 01:19:32,527 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2023-11-19 01:19:36,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=499193.3333333333, ans=0.5 2023-11-19 01:19:44,788 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2750, loss[loss=0.1048, simple_loss=0.1329, pruned_loss=0.02892, audio_tagging_loss=0.009455, over 15937.00 frames. ], tot_loss[loss=0.09334, simple_loss=0.1101, pruned_loss=0.02747, audio_tagging_loss=0.01081, over 3055273.09 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:20:02,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=499326.6666666667, ans=0.125 2023-11-19 01:20:03,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=499326.6666666667, ans=0.2 2023-11-19 01:20:33,930 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:20:38,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=12.0 2023-11-19 01:20:40,206 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2800, loss[loss=0.07348, simple_loss=0.08835, pruned_loss=0.01619, audio_tagging_loss=0.01312, over 14500.00 frames. ], tot_loss[loss=0.09336, simple_loss=0.1101, pruned_loss=0.02742, audio_tagging_loss=0.01089, over 3052814.64 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:20:41,514 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=499593.3333333333, ans=0.0 2023-11-19 01:20:42,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499593.3333333333, ans=0.1 2023-11-19 01:20:42,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=499593.3333333333, ans=0.125 2023-11-19 01:20:45,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=12.0 2023-11-19 01:21:11,306 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.889e+01 9.395e+01 1.013e+02 1.273e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 01:21:28,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2023-11-19 01:21:35,074 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2850, loss[loss=0.08817, simple_loss=0.101, pruned_loss=0.02459, audio_tagging_loss=0.01306, over 14262.00 frames. ], tot_loss[loss=0.09336, simple_loss=0.1105, pruned_loss=0.02735, audio_tagging_loss=0.01074, over 3055308.12 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:21:52,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499993.3333333333, ans=0.1 2023-11-19 01:22:32,054 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2900, loss[loss=0.07362, simple_loss=0.08547, pruned_loss=0.0186, audio_tagging_loss=0.01229, over 16058.00 frames. ], tot_loss[loss=0.09366, simple_loss=0.111, pruned_loss=0.02749, audio_tagging_loss=0.01069, over 3050844.36 frames. ], batch size: 61, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:22:34,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=22.5 2023-11-19 01:22:36,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=500260.0, ans=0.0 2023-11-19 01:22:49,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=500326.6666666667, ans=0.125 2023-11-19 01:22:51,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=500326.6666666667, ans=0.125 2023-11-19 01:22:58,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=500393.3333333333, ans=0.0 2023-11-19 01:23:02,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.483e+01 9.561e+01 1.049e+02 1.503e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-19 01:23:06,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=500460.0, ans=0.125 2023-11-19 01:23:10,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=500460.0, ans=0.0 2023-11-19 01:23:27,931 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 2950, loss[loss=0.1113, simple_loss=0.125, pruned_loss=0.03796, audio_tagging_loss=0.01088, over 14873.00 frames. ], tot_loss[loss=0.09398, simple_loss=0.1114, pruned_loss=0.02765, audio_tagging_loss=0.01065, over 3044081.41 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:23:38,080 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.02 vs. limit=22.5 2023-11-19 01:23:38,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=500660.0, ans=0.125 2023-11-19 01:23:40,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=500660.0, ans=0.0 2023-11-19 01:23:40,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=500660.0, ans=0.0 2023-11-19 01:23:41,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500660.0, ans=0.1 2023-11-19 01:24:03,923 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-19 01:24:14,131 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=500860.0, ans=0.0 2023-11-19 01:24:22,441 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3000, loss[loss=0.09364, simple_loss=0.1105, pruned_loss=0.02827, audio_tagging_loss=0.01013, over 16238.00 frames. ], tot_loss[loss=0.09576, simple_loss=0.1134, pruned_loss=0.02836, audio_tagging_loss=0.01069, over 3051056.95 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:24:22,441 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 01:24:37,343 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.3678, 2.6770, 2.6949, 2.1073, 2.6241, 2.5092, 2.5403, 2.6347], device='cuda:3') 2023-11-19 01:24:54,901 INFO [train_asr.py:1147] (3/4) Epoch 7, validation: loss=0.06857, simple_loss=0.05795, pruned_loss=0.007692, audio_tagging_loss=0.0319, over 4681554.00 frames. 2023-11-19 01:24:54,902 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 01:25:23,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=501060.0, ans=0.125 2023-11-19 01:25:24,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.798e+01 9.751e+01 1.102e+02 1.409e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 01:25:31,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.84 vs. limit=15.0 2023-11-19 01:25:40,642 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.83 vs. limit=22.5 2023-11-19 01:25:47,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=501193.3333333333, ans=0.2 2023-11-19 01:25:50,598 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3050, loss[loss=0.1041, simple_loss=0.1289, pruned_loss=0.02838, audio_tagging_loss=0.01128, over 15984.00 frames. ], tot_loss[loss=0.09601, simple_loss=0.1138, pruned_loss=0.02831, audio_tagging_loss=0.01081, over 3057365.04 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:25:57,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=501260.0, ans=0.2 2023-11-19 01:26:15,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=501393.3333333333, ans=0.07 2023-11-19 01:26:25,154 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:26:34,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5 2023-11-19 01:26:43,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=501526.6666666667, ans=0.125 2023-11-19 01:26:46,168 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3100, loss[loss=0.1258, simple_loss=0.153, pruned_loss=0.03817, audio_tagging_loss=0.01109, over 16345.00 frames. ], tot_loss[loss=0.09624, simple_loss=0.1139, pruned_loss=0.02839, audio_tagging_loss=0.01089, over 3054173.83 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:27:03,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=501660.0, ans=0.2 2023-11-19 01:27:11,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=501726.6666666667, ans=0.2 2023-11-19 01:27:17,742 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.685e+01 9.204e+01 1.020e+02 1.427e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 01:27:42,158 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3150, loss[loss=0.09733, simple_loss=0.1155, pruned_loss=0.02724, audio_tagging_loss=0.01234, over 16052.00 frames. ], tot_loss[loss=0.096, simple_loss=0.1139, pruned_loss=0.02819, audio_tagging_loss=0.01086, over 3056451.62 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:27:48,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2023-11-19 01:27:49,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501926.6666666667, ans=0.1 2023-11-19 01:27:52,603 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=501993.3333333333, ans=0.125 2023-11-19 01:27:55,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.26 vs. limit=15.0 2023-11-19 01:28:00,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=501993.3333333333, ans=0.125 2023-11-19 01:28:06,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=502060.0, ans=0.0 2023-11-19 01:28:10,792 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=15.0 2023-11-19 01:28:24,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=502126.6666666667, ans=0.0 2023-11-19 01:28:29,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=502193.3333333333, ans=0.125 2023-11-19 01:28:37,939 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3200, loss[loss=0.03406, simple_loss=0.03135, pruned_loss=0.00745, audio_tagging_loss=0.01093, over 16159.00 frames. ], tot_loss[loss=0.09583, simple_loss=0.1135, pruned_loss=0.02815, audio_tagging_loss=0.01093, over 3058658.19 frames. ], batch size: 65, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:28:38,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=502260.0, ans=0.125 2023-11-19 01:28:43,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502260.0, ans=0.1 2023-11-19 01:28:52,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2023-11-19 01:28:57,431 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-19 01:29:03,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=502393.3333333333, ans=0.125 2023-11-19 01:29:06,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=502393.3333333333, ans=0.09899494936611666 2023-11-19 01:29:08,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.725e+01 8.572e+01 9.353e+01 1.015e+02 1.372e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-19 01:29:30,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-11-19 01:29:33,137 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3250, loss[loss=0.1053, simple_loss=0.1255, pruned_loss=0.03053, audio_tagging_loss=0.01197, over 15857.00 frames. ], tot_loss[loss=0.09595, simple_loss=0.1137, pruned_loss=0.02811, audio_tagging_loss=0.01099, over 3053143.82 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:29:34,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=502593.3333333333, ans=0.125 2023-11-19 01:29:37,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=502593.3333333333, ans=0.1 2023-11-19 01:30:00,306 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=502726.6666666667, ans=0.125 2023-11-19 01:30:29,455 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3300, loss[loss=0.1361, simple_loss=0.156, pruned_loss=0.04519, audio_tagging_loss=0.01293, over 15480.00 frames. ], tot_loss[loss=0.0964, simple_loss=0.114, pruned_loss=0.02823, audio_tagging_loss=0.01117, over 3061405.46 frames. ], batch size: 54, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:30:43,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=502993.3333333333, ans=0.0 2023-11-19 01:31:00,701 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.481e+01 9.247e+01 1.047e+02 1.658e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-19 01:31:15,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-19 01:31:26,464 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3350, loss[loss=0.08351, simple_loss=0.101, pruned_loss=0.02216, audio_tagging_loss=0.01087, over 14536.00 frames. ], tot_loss[loss=0.09628, simple_loss=0.1138, pruned_loss=0.02831, audio_tagging_loss=0.01105, over 3056384.66 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:31:32,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=503260.0, ans=0.2 2023-11-19 01:31:37,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=503326.6666666667, ans=0.02 2023-11-19 01:31:51,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=503393.3333333333, ans=0.125 2023-11-19 01:32:15,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=503526.6666666667, ans=0.125 2023-11-19 01:32:18,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=503526.6666666667, ans=0.2 2023-11-19 01:32:21,467 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3400, loss[loss=0.08275, simple_loss=0.09599, pruned_loss=0.02353, audio_tagging_loss=0.01123, over 14647.00 frames. ], tot_loss[loss=0.09534, simple_loss=0.1126, pruned_loss=0.02807, audio_tagging_loss=0.01096, over 3044795.00 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:32:52,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-19 01:32:53,223 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.287e+01 9.053e+01 9.903e+01 1.231e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 01:33:15,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=503860.0, ans=0.07 2023-11-19 01:33:15,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=503860.0, ans=0.2 2023-11-19 01:33:17,100 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3450, loss[loss=0.08116, simple_loss=0.09812, pruned_loss=0.02414, audio_tagging_loss=0.007966, over 15029.00 frames. ], tot_loss[loss=0.09576, simple_loss=0.1135, pruned_loss=0.02822, audio_tagging_loss=0.01081, over 3051357.32 frames. ], batch size: 58, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:33:34,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=503993.3333333333, ans=0.0 2023-11-19 01:33:44,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=504060.0, ans=0.125 2023-11-19 01:33:53,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=504126.6666666667, ans=0.0 2023-11-19 01:34:01,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=504193.3333333333, ans=0.125 2023-11-19 01:34:13,635 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3500, loss[loss=0.09735, simple_loss=0.112, pruned_loss=0.0269, audio_tagging_loss=0.01443, over 14387.00 frames. ], tot_loss[loss=0.09542, simple_loss=0.1133, pruned_loss=0.0281, audio_tagging_loss=0.01069, over 3041946.30 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 16.0 2023-11-19 01:34:22,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-19 01:34:30,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=504326.6666666667, ans=0.0 2023-11-19 01:34:35,332 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=504393.3333333333, ans=0.0 2023-11-19 01:34:43,115 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:34:45,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.641e+01 8.567e+01 9.282e+01 1.040e+02 1.334e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-19 01:35:09,360 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3550, loss[loss=0.07787, simple_loss=0.08671, pruned_loss=0.02239, audio_tagging_loss=0.01213, over 15205.00 frames. ], tot_loss[loss=0.0947, simple_loss=0.1121, pruned_loss=0.02793, audio_tagging_loss=0.01073, over 3041178.52 frames. ], batch size: 59, lr: 1.00e-02, grad_scale: 16.0 2023-11-19 01:35:18,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=504660.0, ans=0.0 2023-11-19 01:35:23,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=504660.0, ans=0.125 2023-11-19 01:35:35,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=504726.6666666667, ans=0.0 2023-11-19 01:35:38,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=504726.6666666667, ans=0.125 2023-11-19 01:35:56,389 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-19 01:36:04,813 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3600, loss[loss=0.09661, simple_loss=0.1089, pruned_loss=0.02898, audio_tagging_loss=0.01315, over 14673.00 frames. ], tot_loss[loss=0.09409, simple_loss=0.1114, pruned_loss=0.02766, audio_tagging_loss=0.01073, over 3038930.03 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:36:19,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=504993.3333333333, ans=0.0 2023-11-19 01:36:28,435 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2023-11-19 01:36:36,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.561e+01 9.359e+01 1.025e+02 1.551e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 01:36:48,688 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=15.0 2023-11-19 01:36:50,297 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.393e-01 2023-11-19 01:37:00,645 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3650, loss[loss=0.08169, simple_loss=0.09622, pruned_loss=0.02327, audio_tagging_loss=0.0103, over 14881.00 frames. ], tot_loss[loss=0.09374, simple_loss=0.1109, pruned_loss=0.02755, audio_tagging_loss=0.01073, over 3042238.22 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:37:03,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2023-11-19 01:37:10,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=505326.6666666667, ans=0.0 2023-11-19 01:37:24,029 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:37:32,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=505460.0, ans=0.125 2023-11-19 01:37:42,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505460.0, ans=0.1 2023-11-19 01:37:48,296 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.37 vs. limit=22.5 2023-11-19 01:37:54,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.72 vs. limit=15.0 2023-11-19 01:37:54,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=505593.3333333333, ans=0.125 2023-11-19 01:37:55,892 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3700, loss[loss=0.08697, simple_loss=0.1013, pruned_loss=0.02753, audio_tagging_loss=0.008785, over 16726.00 frames. ], tot_loss[loss=0.09377, simple_loss=0.1112, pruned_loss=0.02748, audio_tagging_loss=0.0107, over 3049458.54 frames. ], batch size: 63, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:38:08,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=505660.0, ans=0.0 2023-11-19 01:38:22,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=505726.6666666667, ans=10.0 2023-11-19 01:38:27,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=505726.6666666667, ans=0.1 2023-11-19 01:38:28,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.900e+01 9.822e+01 1.122e+02 1.774e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-19 01:38:51,766 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3750, loss[loss=0.1004, simple_loss=0.1216, pruned_loss=0.02666, audio_tagging_loss=0.0129, over 16852.00 frames. ], tot_loss[loss=0.09549, simple_loss=0.1134, pruned_loss=0.0281, audio_tagging_loss=0.01069, over 3057469.40 frames. ], batch size: 64, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:39:13,158 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.23 vs. limit=15.0 2023-11-19 01:39:16,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.80 vs. limit=15.0 2023-11-19 01:39:30,911 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:39:48,458 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3800, loss[loss=0.102, simple_loss=0.1252, pruned_loss=0.02981, audio_tagging_loss=0.009566, over 15870.00 frames. ], tot_loss[loss=0.09549, simple_loss=0.1133, pruned_loss=0.0281, audio_tagging_loss=0.01072, over 3051406.27 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:40:01,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=506326.6666666667, ans=0.125 2023-11-19 01:40:19,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=506393.3333333333, ans=0.0 2023-11-19 01:40:20,051 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.899e+01 9.421e+01 1.052e+02 1.490e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-19 01:40:37,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=506526.6666666667, ans=0.125 2023-11-19 01:40:43,427 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3850, loss[loss=0.1016, simple_loss=0.1234, pruned_loss=0.02824, audio_tagging_loss=0.01167, over 14787.00 frames. ], tot_loss[loss=0.09534, simple_loss=0.1127, pruned_loss=0.02805, audio_tagging_loss=0.01094, over 3046402.92 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:40:52,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=506593.3333333333, ans=0.0 2023-11-19 01:40:59,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=506660.0, ans=0.0 2023-11-19 01:41:02,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=506660.0, ans=0.5 2023-11-19 01:41:03,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.22 vs. limit=22.5 2023-11-19 01:41:20,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=506793.3333333333, ans=0.2 2023-11-19 01:41:20,280 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=506793.3333333333, ans=0.125 2023-11-19 01:41:25,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.02 vs. limit=22.5 2023-11-19 01:41:38,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=506860.0, ans=0.0 2023-11-19 01:41:41,584 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3900, loss[loss=0.09757, simple_loss=0.1109, pruned_loss=0.03002, audio_tagging_loss=0.0121, over 15815.00 frames. ], tot_loss[loss=0.09533, simple_loss=0.1127, pruned_loss=0.028, audio_tagging_loss=0.01096, over 3052421.77 frames. ], batch size: 60, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:41:45,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=506926.6666666667, ans=0.0 2023-11-19 01:41:47,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=506926.6666666667, ans=0.125 2023-11-19 01:42:01,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=506993.3333333333, ans=0.0 2023-11-19 01:42:07,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=507060.0, ans=15.0 2023-11-19 01:42:09,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.87 vs. limit=15.0 2023-11-19 01:42:11,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=507060.0, ans=0.025 2023-11-19 01:42:13,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 8.583e+01 9.293e+01 1.003e+02 1.876e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 01:42:15,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=507126.6666666667, ans=0.035 2023-11-19 01:42:19,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=507126.6666666667, ans=0.0 2023-11-19 01:42:22,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507126.6666666667, ans=0.1 2023-11-19 01:42:38,335 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 3950, loss[loss=0.09922, simple_loss=0.1109, pruned_loss=0.03255, audio_tagging_loss=0.01123, over 15669.00 frames. ], tot_loss[loss=0.09466, simple_loss=0.1116, pruned_loss=0.02777, audio_tagging_loss=0.01108, over 3052176.17 frames. ], batch size: 58, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:42:38,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=507260.0, ans=0.0 2023-11-19 01:42:48,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=507326.6666666667, ans=0.125 2023-11-19 01:43:07,465 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.533e-01 2023-11-19 01:43:33,330 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4000, loss[loss=0.07115, simple_loss=0.08069, pruned_loss=0.01722, audio_tagging_loss=0.01358, over 15327.00 frames. ], tot_loss[loss=0.09477, simple_loss=0.1116, pruned_loss=0.02777, audio_tagging_loss=0.01118, over 3048234.27 frames. ], batch size: 60, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:43:35,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=507593.3333333333, ans=0.015 2023-11-19 01:43:58,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=507726.6666666667, ans=0.125 2023-11-19 01:44:06,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 9.104e+01 9.882e+01 1.124e+02 1.409e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-19 01:44:13,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=507793.3333333333, ans=0.125 2023-11-19 01:44:19,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=507860.0, ans=0.05 2023-11-19 01:44:28,704 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4050, loss[loss=0.1196, simple_loss=0.158, pruned_loss=0.0339, audio_tagging_loss=0.006673, over 15766.00 frames. ], tot_loss[loss=0.09577, simple_loss=0.1131, pruned_loss=0.02808, audio_tagging_loss=0.01112, over 3051975.31 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:44:31,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=507926.6666666667, ans=0.1 2023-11-19 01:44:32,455 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:44:50,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=507993.3333333333, ans=0.0 2023-11-19 01:45:03,076 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=508126.6666666667, ans=0.0 2023-11-19 01:45:10,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=508126.6666666667, ans=0.125 2023-11-19 01:45:25,005 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4100, loss[loss=0.1, simple_loss=0.1151, pruned_loss=0.0316, audio_tagging_loss=0.01084, over 14932.00 frames. ], tot_loss[loss=0.09634, simple_loss=0.1139, pruned_loss=0.02838, audio_tagging_loss=0.01099, over 3051145.91 frames. ], batch size: 56, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:45:33,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=508260.0, ans=0.125 2023-11-19 01:45:44,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508326.6666666667, ans=0.1 2023-11-19 01:45:56,109 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.454e+01 9.161e+01 9.715e+01 1.284e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 01:46:10,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=508526.6666666667, ans=0.95 2023-11-19 01:46:15,802 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-11-19 01:46:20,591 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4150, loss[loss=0.08757, simple_loss=0.09886, pruned_loss=0.02625, audio_tagging_loss=0.0119, over 15765.00 frames. ], tot_loss[loss=0.09634, simple_loss=0.114, pruned_loss=0.02847, audio_tagging_loss=0.01088, over 3046219.30 frames. ], batch size: 58, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:46:27,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=508593.3333333333, ans=0.125 2023-11-19 01:46:50,417 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-11-19 01:46:54,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=508793.3333333333, ans=0.0 2023-11-19 01:46:58,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=508793.3333333333, ans=0.2 2023-11-19 01:47:01,988 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:47:04,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=508860.0, ans=0.125 2023-11-19 01:47:10,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=508860.0, ans=0.0 2023-11-19 01:47:12,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=508860.0, ans=10.0 2023-11-19 01:47:15,819 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4200, loss[loss=0.1162, simple_loss=0.1362, pruned_loss=0.03837, audio_tagging_loss=0.00968, over 15565.00 frames. ], tot_loss[loss=0.09676, simple_loss=0.1147, pruned_loss=0.02875, audio_tagging_loss=0.01066, over 3053057.12 frames. ], batch size: 56, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:47:32,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.28 vs. limit=12.0 2023-11-19 01:47:34,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=508993.3333333333, ans=0.05 2023-11-19 01:47:48,448 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.868e+01 9.396e+01 1.025e+02 1.839e+02, threshold=1.879e+02, percent-clipped=1.0 2023-11-19 01:47:48,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=509126.6666666667, ans=0.0 2023-11-19 01:47:54,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=509126.6666666667, ans=0.0 2023-11-19 01:47:54,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=509126.6666666667, ans=0.05 2023-11-19 01:47:54,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2023-11-19 01:47:59,299 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.897e-01 2023-11-19 01:48:11,860 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4250, loss[loss=0.09913, simple_loss=0.1174, pruned_loss=0.02794, audio_tagging_loss=0.0125, over 15254.00 frames. ], tot_loss[loss=0.09599, simple_loss=0.1136, pruned_loss=0.02835, audio_tagging_loss=0.01083, over 3048442.58 frames. ], batch size: 55, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:48:15,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=509260.0, ans=0.0 2023-11-19 01:48:32,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=509326.6666666667, ans=0.2 2023-11-19 01:48:32,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-19 01:48:38,014 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-11-19 01:48:40,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2023-11-19 01:48:41,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=509393.3333333333, ans=0.0 2023-11-19 01:48:42,268 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-11-19 01:48:43,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=509393.3333333333, ans=0.0 2023-11-19 01:48:44,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=509460.0, ans=0.2 2023-11-19 01:48:51,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=509460.0, ans=0.125 2023-11-19 01:48:52,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=509460.0, ans=0.125 2023-11-19 01:49:07,995 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4300, loss[loss=0.09532, simple_loss=0.1157, pruned_loss=0.02903, audio_tagging_loss=0.00846, over 14734.00 frames. ], tot_loss[loss=0.09587, simple_loss=0.1137, pruned_loss=0.02825, audio_tagging_loss=0.01079, over 3049294.49 frames. ], batch size: 55, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:49:11,616 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=12.0 2023-11-19 01:49:12,725 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-11-19 01:49:14,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=509593.3333333333, ans=0.1 2023-11-19 01:49:22,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=509660.0, ans=0.125 2023-11-19 01:49:26,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=509660.0, ans=0.0 2023-11-19 01:49:39,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.902e+01 9.994e+01 1.090e+02 2.369e+02, threshold=1.999e+02, percent-clipped=2.0 2023-11-19 01:49:45,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.74 vs. limit=6.0 2023-11-19 01:50:02,568 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4350, loss[loss=0.08854, simple_loss=0.1016, pruned_loss=0.02772, audio_tagging_loss=0.009992, over 15390.00 frames. ], tot_loss[loss=0.09583, simple_loss=0.1137, pruned_loss=0.02819, audio_tagging_loss=0.01077, over 3050893.58 frames. ], batch size: 57, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:50:39,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=510126.6666666667, ans=0.09899494936611666 2023-11-19 01:50:49,469 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=510193.3333333333, ans=0.0 2023-11-19 01:50:51,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=510193.3333333333, ans=0.125 2023-11-19 01:50:58,224 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4400, loss[loss=0.09724, simple_loss=0.1201, pruned_loss=0.026, audio_tagging_loss=0.01121, over 16438.00 frames. ], tot_loss[loss=0.09534, simple_loss=0.1133, pruned_loss=0.02791, audio_tagging_loss=0.01076, over 3048718.21 frames. ], batch size: 64, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:51:15,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=510326.6666666667, ans=0.125 2023-11-19 01:51:20,495 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-11-19 01:51:29,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=510393.3333333333, ans=0.125 2023-11-19 01:51:30,391 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.237e+01 9.053e+01 9.942e+01 1.233e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 01:51:33,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=510460.0, ans=0.125 2023-11-19 01:51:54,567 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4450, loss[loss=0.07805, simple_loss=0.08896, pruned_loss=0.02362, audio_tagging_loss=0.009952, over 15857.00 frames. ], tot_loss[loss=0.09504, simple_loss=0.1131, pruned_loss=0.02777, audio_tagging_loss=0.01073, over 3050603.64 frames. ], batch size: 61, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:51:56,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=510593.3333333333, ans=0.125 2023-11-19 01:51:57,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=510593.3333333333, ans=0.125 2023-11-19 01:52:01,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=510593.3333333333, ans=0.125 2023-11-19 01:52:12,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=510660.0, ans=0.0 2023-11-19 01:52:30,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=510793.3333333333, ans=0.125 2023-11-19 01:52:49,741 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4500, loss[loss=0.1223, simple_loss=0.1499, pruned_loss=0.04008, audio_tagging_loss=0.007342, over 14935.00 frames. ], tot_loss[loss=0.09402, simple_loss=0.1116, pruned_loss=0.0275, audio_tagging_loss=0.01073, over 3045361.18 frames. ], batch size: 54, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:52:57,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=510926.6666666667, ans=0.125 2023-11-19 01:52:57,594 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.80 vs. limit=15.0 2023-11-19 01:53:08,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=510993.3333333333, ans=0.0 2023-11-19 01:53:21,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=511060.0, ans=0.1 2023-11-19 01:53:22,563 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 9.019e+01 9.835e+01 1.060e+02 1.349e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-19 01:53:24,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=511126.6666666667, ans=0.95 2023-11-19 01:53:28,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=511126.6666666667, ans=0.04949747468305833 2023-11-19 01:53:40,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=511193.3333333333, ans=0.5 2023-11-19 01:53:45,532 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4550, loss[loss=0.08558, simple_loss=0.1085, pruned_loss=0.02412, audio_tagging_loss=0.007228, over 14191.00 frames. ], tot_loss[loss=0.09395, simple_loss=0.1114, pruned_loss=0.02747, audio_tagging_loss=0.01081, over 3033242.16 frames. ], batch size: 53, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:53:51,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=511260.0, ans=10.0 2023-11-19 01:53:51,085 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2023-11-19 01:53:54,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=511260.0, ans=0.125 2023-11-19 01:53:59,040 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.24 vs. limit=15.0 2023-11-19 01:54:08,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=511393.3333333333, ans=0.125 2023-11-19 01:54:10,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=511393.3333333333, ans=15.0 2023-11-19 01:54:29,220 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:54:36,971 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=511526.6666666667, ans=0.05 2023-11-19 01:54:37,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=511526.6666666667, ans=0.125 2023-11-19 01:54:39,312 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2023-11-19 01:54:41,986 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4600, loss[loss=0.08762, simple_loss=0.1054, pruned_loss=0.02474, audio_tagging_loss=0.01017, over 15435.00 frames. ], tot_loss[loss=0.09406, simple_loss=0.1113, pruned_loss=0.02756, audio_tagging_loss=0.01084, over 3034683.03 frames. ], batch size: 59, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:54:42,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=511593.3333333333, ans=0.0 2023-11-19 01:55:14,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.730e+01 9.456e+01 1.065e+02 1.421e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-19 01:55:23,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=511793.3333333333, ans=0.125 2023-11-19 01:55:37,975 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4650, loss[loss=0.07747, simple_loss=0.09491, pruned_loss=0.01877, audio_tagging_loss=0.01124, over 15028.00 frames. ], tot_loss[loss=0.09382, simple_loss=0.1111, pruned_loss=0.02737, audio_tagging_loss=0.01088, over 3036107.32 frames. ], batch size: 56, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:55:46,540 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:55:55,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=511993.3333333333, ans=0.025 2023-11-19 01:56:01,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=512060.0, ans=0.125 2023-11-19 01:56:17,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=512126.6666666667, ans=0.035 2023-11-19 01:56:27,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=512193.3333333333, ans=0.125 2023-11-19 01:56:30,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512193.3333333333, ans=0.1 2023-11-19 01:56:30,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=22.5 2023-11-19 01:56:31,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512193.3333333333, ans=0.1 2023-11-19 01:56:33,517 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4700, loss[loss=0.0928, simple_loss=0.1104, pruned_loss=0.02463, audio_tagging_loss=0.013, over 16301.00 frames. ], tot_loss[loss=0.093, simple_loss=0.1097, pruned_loss=0.02704, audio_tagging_loss=0.01111, over 3036139.57 frames. ], batch size: 63, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:56:40,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=512260.0, ans=0.125 2023-11-19 01:56:44,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2023-11-19 01:57:00,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=512393.3333333333, ans=0.125 2023-11-19 01:57:05,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.776e+01 8.634e+01 9.306e+01 1.008e+02 1.470e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-19 01:57:26,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.03 vs. limit=15.0 2023-11-19 01:57:29,458 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4750, loss[loss=0.05831, simple_loss=0.05861, pruned_loss=0.01359, audio_tagging_loss=0.01541, over 14723.00 frames. ], tot_loss[loss=0.09255, simple_loss=0.1092, pruned_loss=0.02681, audio_tagging_loss=0.01114, over 3032354.97 frames. ], batch size: 60, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:57:33,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=512593.3333333333, ans=0.125 2023-11-19 01:58:01,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=512793.3333333333, ans=0.125 2023-11-19 01:58:02,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.54 vs. limit=10.0 2023-11-19 01:58:04,840 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-19 01:58:21,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=512860.0, ans=0.125 2023-11-19 01:58:25,646 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4800, loss[loss=0.1126, simple_loss=0.1359, pruned_loss=0.03377, audio_tagging_loss=0.01087, over 15383.00 frames. ], tot_loss[loss=0.09312, simple_loss=0.1095, pruned_loss=0.02715, audio_tagging_loss=0.01124, over 3030450.33 frames. ], batch size: 55, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:58:40,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=512993.3333333333, ans=0.2 2023-11-19 01:58:46,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=513060.0, ans=0.035 2023-11-19 01:58:57,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.526e+01 9.167e+01 1.022e+02 1.486e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 01:58:57,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513126.6666666667, ans=0.1 2023-11-19 01:59:02,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=513126.6666666667, ans=0.0 2023-11-19 01:59:02,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=513126.6666666667, ans=0.125 2023-11-19 01:59:03,561 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2023-11-19 01:59:17,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=513193.3333333333, ans=0.125 2023-11-19 01:59:19,562 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=513260.0, ans=0.125 2023-11-19 01:59:20,423 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4850, loss[loss=0.1064, simple_loss=0.1334, pruned_loss=0.02709, audio_tagging_loss=0.01258, over 15198.00 frames. ], tot_loss[loss=0.09403, simple_loss=0.1106, pruned_loss=0.02737, audio_tagging_loss=0.01137, over 3031837.56 frames. ], batch size: 56, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:59:50,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2023-11-19 01:59:54,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.15 vs. limit=15.0 2023-11-19 01:59:55,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513460.0, ans=0.1 2023-11-19 02:00:10,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=513526.6666666667, ans=0.0 2023-11-19 02:00:16,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=513593.3333333333, ans=0.125 2023-11-19 02:00:17,673 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4900, loss[loss=0.1143, simple_loss=0.1343, pruned_loss=0.03844, audio_tagging_loss=0.008722, over 14990.00 frames. ], tot_loss[loss=0.09407, simple_loss=0.1109, pruned_loss=0.02739, audio_tagging_loss=0.01122, over 3030984.66 frames. ], batch size: 55, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:00:33,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=513660.0, ans=0.125 2023-11-19 02:00:49,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.524e+01 9.139e+01 9.781e+01 1.634e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 02:01:12,689 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 4950, loss[loss=0.1189, simple_loss=0.1511, pruned_loss=0.03481, audio_tagging_loss=0.008549, over 15992.00 frames. ], tot_loss[loss=0.09446, simple_loss=0.1114, pruned_loss=0.02772, audio_tagging_loss=0.01106, over 3033877.43 frames. ], batch size: 61, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:01:43,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=514060.0, ans=0.125 2023-11-19 02:02:06,280 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:02:08,156 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5000, loss[loss=0.09582, simple_loss=0.1195, pruned_loss=0.0265, audio_tagging_loss=0.009588, over 14669.00 frames. ], tot_loss[loss=0.09501, simple_loss=0.112, pruned_loss=0.0281, audio_tagging_loss=0.01088, over 3031878.67 frames. ], batch size: 54, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:02:11,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=514260.0, ans=0.125 2023-11-19 02:02:20,519 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.99 vs. limit=22.5 2023-11-19 02:02:25,051 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2023-11-19 02:02:39,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=514393.3333333333, ans=0.125 2023-11-19 02:02:40,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.705e+01 9.523e+01 1.061e+02 1.468e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 02:02:45,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=514460.0, ans=0.125 2023-11-19 02:02:48,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=514460.0, ans=0.0 2023-11-19 02:02:57,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=514526.6666666667, ans=0.0 2023-11-19 02:03:02,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-19 02:03:04,459 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5050, loss[loss=0.1102, simple_loss=0.1338, pruned_loss=0.0324, audio_tagging_loss=0.01089, over 15180.00 frames. ], tot_loss[loss=0.09437, simple_loss=0.1114, pruned_loss=0.02789, audio_tagging_loss=0.0108, over 3031330.60 frames. ], batch size: 59, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:03:06,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=514593.3333333333, ans=0.09899494936611666 2023-11-19 02:03:49,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=514860.0, ans=0.0 2023-11-19 02:03:59,686 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5100, loss[loss=0.08958, simple_loss=0.1142, pruned_loss=0.02459, audio_tagging_loss=0.007871, over 14475.00 frames. ], tot_loss[loss=0.09457, simple_loss=0.1116, pruned_loss=0.02799, audio_tagging_loss=0.01078, over 3031600.97 frames. ], batch size: 52, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:04:16,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.76 vs. limit=10.0 2023-11-19 02:04:32,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.394e+01 9.066e+01 1.016e+02 2.426e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-19 02:04:33,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=515126.6666666667, ans=0.2 2023-11-19 02:04:38,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2023-11-19 02:04:54,525 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5150, loss[loss=0.09737, simple_loss=0.1195, pruned_loss=0.02772, audio_tagging_loss=0.00992, over 15208.00 frames. ], tot_loss[loss=0.09374, simple_loss=0.111, pruned_loss=0.02749, audio_tagging_loss=0.01073, over 3035321.53 frames. ], batch size: 58, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:05:05,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=515326.6666666667, ans=0.0 2023-11-19 02:05:21,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=515393.3333333333, ans=0.125 2023-11-19 02:05:39,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-19 02:05:39,864 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-19 02:05:51,339 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5200, loss[loss=0.1085, simple_loss=0.1323, pruned_loss=0.03077, audio_tagging_loss=0.01164, over 15623.00 frames. ], tot_loss[loss=0.09455, simple_loss=0.1121, pruned_loss=0.02781, audio_tagging_loss=0.01067, over 3037515.17 frames. ], batch size: 57, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:05:55,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=515593.3333333333, ans=0.125 2023-11-19 02:05:57,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=515593.3333333333, ans=0.125 2023-11-19 02:06:06,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=515660.0, ans=12.0 2023-11-19 02:06:06,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=515660.0, ans=0.05 2023-11-19 02:06:20,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=515726.6666666667, ans=0.125 2023-11-19 02:06:22,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.952e+01 8.557e+01 9.238e+01 1.015e+02 1.542e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 02:06:33,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=515793.3333333333, ans=0.125 2023-11-19 02:06:34,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=515793.3333333333, ans=0.125 2023-11-19 02:06:46,879 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5250, loss[loss=0.08625, simple_loss=0.1103, pruned_loss=0.02079, audio_tagging_loss=0.01029, over 15884.00 frames. ], tot_loss[loss=0.0945, simple_loss=0.1123, pruned_loss=0.02766, audio_tagging_loss=0.01071, over 3040135.56 frames. ], batch size: 61, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:07:12,342 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:07:20,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=516126.6666666667, ans=0.125 2023-11-19 02:07:21,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=516126.6666666667, ans=0.07 2023-11-19 02:07:34,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=516193.3333333333, ans=0.125 2023-11-19 02:07:37,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2023-11-19 02:07:37,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516193.3333333333, ans=0.1 2023-11-19 02:07:41,872 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5300, loss[loss=0.1024, simple_loss=0.1292, pruned_loss=0.0282, audio_tagging_loss=0.009634, over 15863.00 frames. ], tot_loss[loss=0.09457, simple_loss=0.1124, pruned_loss=0.02771, audio_tagging_loss=0.01064, over 3041957.47 frames. ], batch size: 57, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:07:47,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=22.5 2023-11-19 02:07:58,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=516326.6666666667, ans=0.0 2023-11-19 02:08:01,483 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-19 02:08:08,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=516393.3333333333, ans=0.125 2023-11-19 02:08:14,519 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.731e+01 9.566e+01 1.032e+02 1.487e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-19 02:08:21,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=516460.0, ans=0.0 2023-11-19 02:08:36,863 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=516593.3333333333, ans=0.2 2023-11-19 02:08:37,653 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5350, loss[loss=0.09792, simple_loss=0.1217, pruned_loss=0.02662, audio_tagging_loss=0.01045, over 15939.00 frames. ], tot_loss[loss=0.0941, simple_loss=0.112, pruned_loss=0.02745, audio_tagging_loss=0.01064, over 3037331.05 frames. ], batch size: 58, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:08:37,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=516593.3333333333, ans=0.0 2023-11-19 02:08:44,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=516593.3333333333, ans=0.0 2023-11-19 02:09:00,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=516726.6666666667, ans=0.0 2023-11-19 02:09:01,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=516726.6666666667, ans=0.0 2023-11-19 02:09:02,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516726.6666666667, ans=0.1 2023-11-19 02:09:05,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=516726.6666666667, ans=0.2 2023-11-19 02:09:08,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=516726.6666666667, ans=0.07 2023-11-19 02:09:09,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=516793.3333333333, ans=0.125 2023-11-19 02:09:22,994 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-11-19 02:09:29,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=516860.0, ans=0.1 2023-11-19 02:09:33,605 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5400, loss[loss=0.1158, simple_loss=0.1422, pruned_loss=0.03488, audio_tagging_loss=0.009841, over 15878.00 frames. ], tot_loss[loss=0.09453, simple_loss=0.1123, pruned_loss=0.02771, audio_tagging_loss=0.01066, over 3052069.38 frames. ], batch size: 57, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:09:51,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=516993.3333333333, ans=0.125 2023-11-19 02:09:58,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=517060.0, ans=0.2 2023-11-19 02:10:05,680 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.660e+01 9.823e+01 1.115e+02 1.582e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-19 02:10:12,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=517126.6666666667, ans=0.0 2023-11-19 02:10:14,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=517126.6666666667, ans=0.0 2023-11-19 02:10:28,371 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5450, loss[loss=0.1212, simple_loss=0.1516, pruned_loss=0.0355, audio_tagging_loss=0.009885, over 14590.00 frames. ], tot_loss[loss=0.0948, simple_loss=0.1125, pruned_loss=0.02776, audio_tagging_loss=0.01079, over 3044791.75 frames. ], batch size: 53, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:10:29,687 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=517260.0, ans=0.1 2023-11-19 02:10:32,056 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2023-11-19 02:10:58,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517393.3333333333, ans=0.1 2023-11-19 02:11:03,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=517460.0, ans=0.1 2023-11-19 02:11:24,123 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5500, loss[loss=0.08575, simple_loss=0.09973, pruned_loss=0.02412, audio_tagging_loss=0.01177, over 15800.00 frames. ], tot_loss[loss=0.09477, simple_loss=0.1122, pruned_loss=0.02775, audio_tagging_loss=0.01091, over 3044651.65 frames. ], batch size: 60, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:11:25,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=517593.3333333333, ans=0.125 2023-11-19 02:11:31,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=517593.3333333333, ans=0.015 2023-11-19 02:11:31,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517593.3333333333, ans=0.1 2023-11-19 02:11:34,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=517593.3333333333, ans=0.125 2023-11-19 02:11:37,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.39 vs. limit=10.0 2023-11-19 02:11:46,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=517726.6666666667, ans=0.0 2023-11-19 02:11:55,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.498e+01 9.493e+01 1.061e+02 1.375e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-19 02:12:20,006 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5550, loss[loss=0.1117, simple_loss=0.1319, pruned_loss=0.03533, audio_tagging_loss=0.01039, over 14984.00 frames. ], tot_loss[loss=0.09584, simple_loss=0.1137, pruned_loss=0.02809, audio_tagging_loss=0.01092, over 3048112.06 frames. ], batch size: 55, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:12:23,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.96 vs. limit=15.0 2023-11-19 02:12:26,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=517926.6666666667, ans=0.2 2023-11-19 02:12:29,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-19 02:12:34,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.96 vs. limit=15.0 2023-11-19 02:13:12,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=518193.3333333333, ans=0.0 2023-11-19 02:13:14,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=518260.0, ans=0.125 2023-11-19 02:13:15,100 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5600, loss[loss=0.08863, simple_loss=0.1014, pruned_loss=0.02611, audio_tagging_loss=0.01181, over 16247.00 frames. ], tot_loss[loss=0.0953, simple_loss=0.1128, pruned_loss=0.02785, audio_tagging_loss=0.01106, over 3056739.89 frames. ], batch size: 63, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:13:17,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=518260.0, ans=0.0 2023-11-19 02:13:33,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=518326.6666666667, ans=0.125 2023-11-19 02:13:46,978 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.345e+01 9.240e+01 1.027e+02 1.400e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 02:13:55,458 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:14:08,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=518593.3333333333, ans=0.0 2023-11-19 02:14:09,779 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5650, loss[loss=0.09506, simple_loss=0.1072, pruned_loss=0.02718, audio_tagging_loss=0.01429, over 14533.00 frames. ], tot_loss[loss=0.09511, simple_loss=0.1125, pruned_loss=0.02776, audio_tagging_loss=0.01113, over 3056328.40 frames. ], batch size: 58, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:14:10,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=518593.3333333333, ans=0.2 2023-11-19 02:14:31,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=518726.6666666667, ans=0.1 2023-11-19 02:14:37,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=518726.6666666667, ans=0.1 2023-11-19 02:14:50,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=518793.3333333333, ans=0.125 2023-11-19 02:14:54,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.34 vs. limit=10.0 2023-11-19 02:14:57,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=518860.0, ans=0.0 2023-11-19 02:15:00,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=518860.0, ans=0.0 2023-11-19 02:15:06,299 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5700, loss[loss=0.08965, simple_loss=0.1059, pruned_loss=0.02771, audio_tagging_loss=0.008991, over 15781.00 frames. ], tot_loss[loss=0.09429, simple_loss=0.1113, pruned_loss=0.02751, audio_tagging_loss=0.01114, over 3058357.75 frames. ], batch size: 59, lr: 9.89e-03, grad_scale: 64.0 2023-11-19 02:15:10,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=518926.6666666667, ans=0.2 2023-11-19 02:15:11,096 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-11-19 02:15:19,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=518993.3333333333, ans=0.125 2023-11-19 02:15:38,526 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.416e+01 9.360e+01 1.085e+02 2.101e+02, threshold=1.872e+02, percent-clipped=1.0 2023-11-19 02:15:39,855 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:15:41,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=519126.6666666667, ans=0.125 2023-11-19 02:15:43,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.33 vs. limit=15.0 2023-11-19 02:15:56,532 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:16:01,538 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5750, loss[loss=0.08633, simple_loss=0.1003, pruned_loss=0.02541, audio_tagging_loss=0.01074, over 14154.00 frames. ], tot_loss[loss=0.09443, simple_loss=0.1115, pruned_loss=0.02766, audio_tagging_loss=0.011, over 3047628.85 frames. ], batch size: 54, lr: 9.89e-03, grad_scale: 32.0 2023-11-19 02:16:07,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=519260.0, ans=0.125 2023-11-19 02:16:17,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=519326.6666666667, ans=0.125 2023-11-19 02:16:52,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=519526.6666666667, ans=0.125 2023-11-19 02:16:56,607 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5800, loss[loss=0.1329, simple_loss=0.1655, pruned_loss=0.04282, audio_tagging_loss=0.007313, over 15188.00 frames. ], tot_loss[loss=0.09416, simple_loss=0.1114, pruned_loss=0.02754, audio_tagging_loss=0.01091, over 3053033.53 frames. ], batch size: 58, lr: 9.89e-03, grad_scale: 16.0 2023-11-19 02:17:06,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=519593.3333333333, ans=0.0 2023-11-19 02:17:06,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=519593.3333333333, ans=0.0 2023-11-19 02:17:21,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=519726.6666666667, ans=0.125 2023-11-19 02:17:30,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=519793.3333333333, ans=0.125 2023-11-19 02:17:31,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.540e+01 9.536e+01 1.116e+02 2.278e+02, threshold=1.907e+02, percent-clipped=1.0 2023-11-19 02:17:53,439 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5850, loss[loss=0.104, simple_loss=0.1358, pruned_loss=0.02508, audio_tagging_loss=0.01103, over 14812.00 frames. ], tot_loss[loss=0.09442, simple_loss=0.1117, pruned_loss=0.02769, audio_tagging_loss=0.01087, over 3048313.77 frames. ], batch size: 57, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:17:59,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=519926.6666666667, ans=0.125 2023-11-19 02:18:04,191 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.01 vs. limit=22.5 2023-11-19 02:18:06,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=519993.3333333333, ans=0.125 2023-11-19 02:18:20,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=520060.0, ans=0.07 2023-11-19 02:18:22,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=520060.0, ans=0.07 2023-11-19 02:18:44,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=520193.3333333333, ans=0.125 2023-11-19 02:18:47,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=520193.3333333333, ans=0.125 2023-11-19 02:18:49,462 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5900, loss[loss=0.06869, simple_loss=0.07995, pruned_loss=0.01896, audio_tagging_loss=0.009745, over 16242.00 frames. ], tot_loss[loss=0.09358, simple_loss=0.1109, pruned_loss=0.02727, audio_tagging_loss=0.01084, over 3046287.56 frames. ], batch size: 61, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:19:01,806 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.69 vs. limit=6.0 2023-11-19 02:19:14,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.67 vs. limit=22.5 2023-11-19 02:19:23,929 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.828e+01 9.582e+01 1.059e+02 2.362e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-19 02:19:26,445 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=22.5 2023-11-19 02:19:41,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2023-11-19 02:19:44,673 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 5950, loss[loss=0.09218, simple_loss=0.1184, pruned_loss=0.02402, audio_tagging_loss=0.008958, over 16474.00 frames. ], tot_loss[loss=0.09337, simple_loss=0.1108, pruned_loss=0.02716, audio_tagging_loss=0.01082, over 3051796.47 frames. ], batch size: 60, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:19:47,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2023-11-19 02:19:47,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520593.3333333333, ans=0.1 2023-11-19 02:19:48,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=520593.3333333333, ans=0.025 2023-11-19 02:20:00,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.55 vs. limit=10.0 2023-11-19 02:20:11,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520726.6666666667, ans=0.1 2023-11-19 02:20:18,068 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-11-19 02:20:40,640 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6000, loss[loss=0.0582, simple_loss=0.06371, pruned_loss=0.01214, audio_tagging_loss=0.0142, over 15702.00 frames. ], tot_loss[loss=0.09292, simple_loss=0.1098, pruned_loss=0.02706, audio_tagging_loss=0.01093, over 3046807.51 frames. ], batch size: 60, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:20:40,641 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 02:21:13,027 INFO [train_asr.py:1147] (3/4) Epoch 7, validation: loss=0.06924, simple_loss=0.05776, pruned_loss=0.007549, audio_tagging_loss=0.0328, over 4681554.00 frames. 2023-11-19 02:21:13,027 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 02:21:36,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=521060.0, ans=0.125 2023-11-19 02:21:40,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521060.0, ans=0.1 2023-11-19 02:21:47,397 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.768e+01 9.511e+01 1.039e+02 1.786e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 02:21:55,307 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:21:55,992 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2023-11-19 02:21:58,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=521193.3333333333, ans=0.0 2023-11-19 02:22:05,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521193.3333333333, ans=0.1 2023-11-19 02:22:07,960 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6050, loss[loss=0.09261, simple_loss=0.1012, pruned_loss=0.03104, audio_tagging_loss=0.01095, over 14091.00 frames. ], tot_loss[loss=0.09349, simple_loss=0.1107, pruned_loss=0.02734, audio_tagging_loss=0.01078, over 3045384.00 frames. ], batch size: 56, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:22:18,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521260.0, ans=0.1 2023-11-19 02:22:25,822 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:22:32,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=521393.3333333333, ans=0.04949747468305833 2023-11-19 02:22:38,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=521393.3333333333, ans=0.125 2023-11-19 02:22:45,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=521460.0, ans=0.125 2023-11-19 02:22:58,128 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=12.0 2023-11-19 02:23:04,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521593.3333333333, ans=0.1 2023-11-19 02:23:05,060 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6100, loss[loss=0.05461, simple_loss=0.05504, pruned_loss=0.01157, audio_tagging_loss=0.01552, over 15065.00 frames. ], tot_loss[loss=0.09298, simple_loss=0.1102, pruned_loss=0.02716, audio_tagging_loss=0.01075, over 3046107.23 frames. ], batch size: 59, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:23:12,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=521593.3333333333, ans=0.2 2023-11-19 02:23:23,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=521660.0, ans=0.125 2023-11-19 02:23:38,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.713e+01 9.332e+01 1.071e+02 1.394e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 02:23:43,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=521793.3333333333, ans=0.125 2023-11-19 02:23:46,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521793.3333333333, ans=0.125 2023-11-19 02:23:49,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=521860.0, ans=0.125 2023-11-19 02:23:49,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=521860.0, ans=0.125 2023-11-19 02:23:59,651 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6150, loss[loss=0.1341, simple_loss=0.1646, pruned_loss=0.04589, audio_tagging_loss=0.005935, over 15364.00 frames. ], tot_loss[loss=0.09306, simple_loss=0.1103, pruned_loss=0.02721, audio_tagging_loss=0.01071, over 3043970.39 frames. ], batch size: 55, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:24:03,000 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=521926.6666666667, ans=0.125 2023-11-19 02:24:10,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=521993.3333333333, ans=0.125 2023-11-19 02:24:29,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522060.0, ans=0.1 2023-11-19 02:24:38,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=15.0 2023-11-19 02:24:40,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=522126.6666666667, ans=0.125 2023-11-19 02:24:47,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=522193.3333333333, ans=0.125 2023-11-19 02:24:54,743 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6200, loss[loss=0.08451, simple_loss=0.09728, pruned_loss=0.02444, audio_tagging_loss=0.01143, over 15088.00 frames. ], tot_loss[loss=0.09222, simple_loss=0.1091, pruned_loss=0.02678, audio_tagging_loss=0.01088, over 3049545.42 frames. ], batch size: 57, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:24:57,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-19 02:25:00,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=522260.0, ans=0.2 2023-11-19 02:25:02,095 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.16 vs. limit=10.0 2023-11-19 02:25:03,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=522260.0, ans=0.0 2023-11-19 02:25:28,330 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.853e+01 9.668e+01 1.071e+02 1.653e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 02:25:48,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=522526.6666666667, ans=0.1 2023-11-19 02:25:50,572 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6250, loss[loss=0.09319, simple_loss=0.1162, pruned_loss=0.02567, audio_tagging_loss=0.009397, over 16123.00 frames. ], tot_loss[loss=0.09161, simple_loss=0.108, pruned_loss=0.02656, audio_tagging_loss=0.01104, over 3051952.79 frames. ], batch size: 60, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:26:09,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=522660.0, ans=0.125 2023-11-19 02:26:43,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2023-11-19 02:26:43,285 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2023-11-19 02:26:45,718 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6300, loss[loss=0.08571, simple_loss=0.1061, pruned_loss=0.02639, audio_tagging_loss=0.006252, over 15560.00 frames. ], tot_loss[loss=0.09226, simple_loss=0.1088, pruned_loss=0.0268, audio_tagging_loss=0.01106, over 3052147.66 frames. ], batch size: 56, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:26:52,831 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.86 vs. limit=22.5 2023-11-19 02:26:54,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=522926.6666666667, ans=0.1 2023-11-19 02:26:56,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=522993.3333333333, ans=0.125 2023-11-19 02:26:58,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.05 vs. limit=15.0 2023-11-19 02:27:06,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=522993.3333333333, ans=0.125 2023-11-19 02:27:22,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.961e+01 8.911e+01 9.683e+01 1.073e+02 1.509e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-19 02:27:26,709 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:27:41,852 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6350, loss[loss=0.1146, simple_loss=0.1351, pruned_loss=0.03741, audio_tagging_loss=0.009652, over 16979.00 frames. ], tot_loss[loss=0.09269, simple_loss=0.1093, pruned_loss=0.02687, audio_tagging_loss=0.01114, over 3048210.97 frames. ], batch size: 62, lr: 9.85e-03, grad_scale: 16.0 2023-11-19 02:27:43,215 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:27:51,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=523260.0, ans=0.025 2023-11-19 02:28:09,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=523393.3333333333, ans=0.125 2023-11-19 02:28:13,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=523393.3333333333, ans=0.0 2023-11-19 02:28:30,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=523526.6666666667, ans=0.125 2023-11-19 02:28:38,392 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6400, loss[loss=0.09139, simple_loss=0.1137, pruned_loss=0.02429, audio_tagging_loss=0.01027, over 14912.00 frames. ], tot_loss[loss=0.09358, simple_loss=0.1102, pruned_loss=0.02728, audio_tagging_loss=0.01119, over 3048241.21 frames. ], batch size: 55, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:28:43,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=523593.3333333333, ans=0.0 2023-11-19 02:28:58,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=523726.6666666667, ans=0.125 2023-11-19 02:29:03,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-19 02:29:13,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.426e+01 9.075e+01 1.046e+02 1.342e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 02:29:15,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=523793.3333333333, ans=0.0 2023-11-19 02:29:15,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=22.5 2023-11-19 02:29:29,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=523860.0, ans=0.125 2023-11-19 02:29:31,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=523860.0, ans=0.125 2023-11-19 02:29:33,089 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6450, loss[loss=0.06899, simple_loss=0.07649, pruned_loss=0.01807, audio_tagging_loss=0.01267, over 14151.00 frames. ], tot_loss[loss=0.09395, simple_loss=0.1106, pruned_loss=0.02732, audio_tagging_loss=0.01134, over 3043679.43 frames. ], batch size: 54, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:29:44,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=523993.3333333333, ans=0.125 2023-11-19 02:29:45,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2023-11-19 02:29:46,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2023-11-19 02:29:49,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=523993.3333333333, ans=0.125 2023-11-19 02:29:54,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=524060.0, ans=0.125 2023-11-19 02:30:12,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=524126.6666666667, ans=0.0 2023-11-19 02:30:28,434 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6500, loss[loss=0.09297, simple_loss=0.1164, pruned_loss=0.02203, audio_tagging_loss=0.01276, over 14357.00 frames. ], tot_loss[loss=0.09365, simple_loss=0.1104, pruned_loss=0.02725, audio_tagging_loss=0.0112, over 3047082.16 frames. ], batch size: 55, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:30:39,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=524326.6666666666, ans=0.025 2023-11-19 02:30:43,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=524326.6666666666, ans=0.125 2023-11-19 02:30:50,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2023-11-19 02:31:04,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.655e+01 9.206e+01 1.007e+02 1.269e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 02:31:05,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=524460.0, ans=0.125 2023-11-19 02:31:07,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-19 02:31:12,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=524526.6666666666, ans=0.0 2023-11-19 02:31:13,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=524526.6666666666, ans=0.1 2023-11-19 02:31:24,664 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6550, loss[loss=0.08258, simple_loss=0.09239, pruned_loss=0.02522, audio_tagging_loss=0.01117, over 15018.00 frames. ], tot_loss[loss=0.09397, simple_loss=0.111, pruned_loss=0.02748, audio_tagging_loss=0.01097, over 3046865.16 frames. ], batch size: 57, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:31:43,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=524660.0, ans=0.125 2023-11-19 02:32:19,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=524926.6666666666, ans=0.1 2023-11-19 02:32:20,067 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6600, loss[loss=0.09428, simple_loss=0.1137, pruned_loss=0.02671, audio_tagging_loss=0.01074, over 15330.00 frames. ], tot_loss[loss=0.09396, simple_loss=0.1109, pruned_loss=0.02759, audio_tagging_loss=0.01093, over 3043237.24 frames. ], batch size: 57, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:32:35,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=524993.3333333334, ans=0.0 2023-11-19 02:32:45,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=525060.0, ans=0.0 2023-11-19 02:32:52,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=525126.6666666666, ans=0.125 2023-11-19 02:32:53,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=525126.6666666666, ans=0.0 2023-11-19 02:32:55,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.622e+01 9.360e+01 1.038e+02 1.373e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 02:33:10,279 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-19 02:33:11,258 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.81 vs. limit=10.0 2023-11-19 02:33:14,897 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6650, loss[loss=0.07226, simple_loss=0.08814, pruned_loss=0.01622, audio_tagging_loss=0.01197, over 15359.00 frames. ], tot_loss[loss=0.09372, simple_loss=0.1108, pruned_loss=0.02749, audio_tagging_loss=0.01084, over 3040059.85 frames. ], batch size: 59, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:33:29,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2023-11-19 02:33:33,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=525326.6666666666, ans=0.5 2023-11-19 02:33:34,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525326.6666666666, ans=0.1 2023-11-19 02:33:42,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.70 vs. limit=10.0 2023-11-19 02:33:51,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2023-11-19 02:34:10,662 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6700, loss[loss=0.1089, simple_loss=0.1404, pruned_loss=0.02876, audio_tagging_loss=0.009952, over 15258.00 frames. ], tot_loss[loss=0.09346, simple_loss=0.1109, pruned_loss=0.02727, audio_tagging_loss=0.01075, over 3035862.67 frames. ], batch size: 57, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:34:18,306 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:34:45,466 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.124e+01 8.532e+01 9.161e+01 1.021e+02 1.335e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 02:34:52,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=525793.3333333334, ans=0.125 2023-11-19 02:34:58,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=525860.0, ans=0.0 2023-11-19 02:34:58,539 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525860.0, ans=0.1 2023-11-19 02:35:05,522 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6750, loss[loss=0.09379, simple_loss=0.1071, pruned_loss=0.0274, audio_tagging_loss=0.01281, over 14565.00 frames. ], tot_loss[loss=0.09339, simple_loss=0.1107, pruned_loss=0.02728, audio_tagging_loss=0.01075, over 3030565.85 frames. ], batch size: 55, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:35:05,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=525926.6666666666, ans=0.0 2023-11-19 02:35:06,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=525926.6666666666, ans=0.2 2023-11-19 02:35:11,411 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-11-19 02:35:26,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=526060.0, ans=0.0 2023-11-19 02:35:27,269 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2023-11-19 02:35:29,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=526060.0, ans=0.125 2023-11-19 02:35:37,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2023-11-19 02:35:39,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=526126.6666666666, ans=0.07 2023-11-19 02:35:42,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=526126.6666666666, ans=0.07 2023-11-19 02:35:46,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=526126.6666666666, ans=0.2 2023-11-19 02:35:51,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2023-11-19 02:35:53,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=526193.3333333334, ans=0.125 2023-11-19 02:36:00,006 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6800, loss[loss=0.08991, simple_loss=0.113, pruned_loss=0.02188, audio_tagging_loss=0.01154, over 16139.00 frames. ], tot_loss[loss=0.09376, simple_loss=0.1113, pruned_loss=0.02734, audio_tagging_loss=0.01077, over 3034095.56 frames. ], batch size: 60, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:36:24,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=12.0 2023-11-19 02:36:35,399 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.668e+01 8.914e+01 9.908e+01 1.073e+02 1.400e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-19 02:36:51,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=526526.6666666666, ans=0.1 2023-11-19 02:36:54,908 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6850, loss[loss=0.1098, simple_loss=0.1341, pruned_loss=0.0332, audio_tagging_loss=0.009487, over 15470.00 frames. ], tot_loss[loss=0.09428, simple_loss=0.1121, pruned_loss=0.02745, audio_tagging_loss=0.0108, over 3034744.94 frames. ], batch size: 56, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:36:59,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526593.3333333334, ans=0.125 2023-11-19 02:37:12,879 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2023-11-19 02:37:27,051 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=526793.3333333334, ans=0.125 2023-11-19 02:37:33,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.47 vs. limit=10.0 2023-11-19 02:37:35,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=526793.3333333334, ans=0.125 2023-11-19 02:37:36,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=12.0 2023-11-19 02:37:37,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=526793.3333333334, ans=0.125 2023-11-19 02:37:47,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=526860.0, ans=0.0 2023-11-19 02:37:50,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-19 02:37:51,108 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6900, loss[loss=0.09385, simple_loss=0.1097, pruned_loss=0.03006, audio_tagging_loss=0.008942, over 14833.00 frames. ], tot_loss[loss=0.09429, simple_loss=0.1124, pruned_loss=0.02739, audio_tagging_loss=0.01072, over 3041243.96 frames. ], batch size: 56, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:37:53,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=526926.6666666666, ans=0.125 2023-11-19 02:38:26,434 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.284e+01 8.942e+01 9.819e+01 1.283e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 02:38:34,405 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:38:41,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=527193.3333333334, ans=0.1 2023-11-19 02:38:45,956 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 6950, loss[loss=0.08207, simple_loss=0.1031, pruned_loss=0.02322, audio_tagging_loss=0.007297, over 15546.00 frames. ], tot_loss[loss=0.09388, simple_loss=0.112, pruned_loss=0.02721, audio_tagging_loss=0.01069, over 3042393.97 frames. ], batch size: 58, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:39:14,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=527393.3333333334, ans=0.0 2023-11-19 02:39:28,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=527460.0, ans=0.125 2023-11-19 02:39:35,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=527526.6666666666, ans=0.125 2023-11-19 02:39:40,942 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7000, loss[loss=0.1089, simple_loss=0.1341, pruned_loss=0.03055, audio_tagging_loss=0.0113, over 15665.00 frames. ], tot_loss[loss=0.09355, simple_loss=0.1115, pruned_loss=0.02704, audio_tagging_loss=0.01075, over 3042390.75 frames. ], batch size: 58, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:39:44,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=527593.3333333334, ans=0.0 2023-11-19 02:39:48,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=527593.3333333334, ans=0.2 2023-11-19 02:40:09,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527726.6666666666, ans=0.1 2023-11-19 02:40:16,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.756e+01 9.618e+01 1.077e+02 1.519e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 02:40:21,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=527793.3333333334, ans=0.125 2023-11-19 02:40:26,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=527860.0, ans=0.125 2023-11-19 02:40:32,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=527860.0, ans=0.125 2023-11-19 02:40:37,802 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7050, loss[loss=0.06782, simple_loss=0.07759, pruned_loss=0.0164, audio_tagging_loss=0.01262, over 15667.00 frames. ], tot_loss[loss=0.09308, simple_loss=0.1107, pruned_loss=0.02679, audio_tagging_loss=0.01094, over 3044971.76 frames. ], batch size: 61, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:40:49,220 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=527993.3333333334, ans=0.09899494936611666 2023-11-19 02:41:17,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2023-11-19 02:41:29,660 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.109e-01 2023-11-19 02:41:33,639 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7100, loss[loss=0.1158, simple_loss=0.1345, pruned_loss=0.03917, audio_tagging_loss=0.009353, over 15569.00 frames. ], tot_loss[loss=0.09343, simple_loss=0.111, pruned_loss=0.0269, audio_tagging_loss=0.01103, over 3045464.64 frames. ], batch size: 55, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:41:51,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=22.5 2023-11-19 02:42:01,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-19 02:42:05,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.57 vs. limit=6.0 2023-11-19 02:42:08,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528460.0, ans=0.1 2023-11-19 02:42:09,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.375e+01 9.204e+01 1.018e+02 1.304e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 02:42:26,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=528526.6666666666, ans=0.125 2023-11-19 02:42:28,583 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7150, loss[loss=0.1004, simple_loss=0.1209, pruned_loss=0.02834, audio_tagging_loss=0.01163, over 14123.00 frames. ], tot_loss[loss=0.0936, simple_loss=0.1109, pruned_loss=0.02707, audio_tagging_loss=0.01105, over 3039998.73 frames. ], batch size: 56, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:42:49,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-19 02:43:04,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=528793.3333333334, ans=0.0 2023-11-19 02:43:19,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=528860.0, ans=0.02 2023-11-19 02:43:25,053 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7200, loss[loss=0.07727, simple_loss=0.08731, pruned_loss=0.02193, audio_tagging_loss=0.01169, over 15123.00 frames. ], tot_loss[loss=0.09352, simple_loss=0.1109, pruned_loss=0.02692, audio_tagging_loss=0.01115, over 3044554.70 frames. ], batch size: 57, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:43:28,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528926.6666666666, ans=0.1 2023-11-19 02:43:47,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=529060.0, ans=0.125 2023-11-19 02:43:51,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=529060.0, ans=0.125 2023-11-19 02:43:59,943 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.223e+01 8.683e+01 9.470e+01 1.039e+02 1.567e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-19 02:44:01,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=529126.6666666666, ans=0.125 2023-11-19 02:44:10,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=529193.3333333334, ans=0.125 2023-11-19 02:44:20,482 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7250, loss[loss=0.07592, simple_loss=0.0867, pruned_loss=0.01877, audio_tagging_loss=0.01379, over 14770.00 frames. ], tot_loss[loss=0.09333, simple_loss=0.1107, pruned_loss=0.02683, audio_tagging_loss=0.01116, over 3041888.91 frames. ], batch size: 57, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:44:27,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=529260.0, ans=0.035 2023-11-19 02:44:31,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=529326.6666666666, ans=0.125 2023-11-19 02:44:32,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=529326.6666666666, ans=0.125 2023-11-19 02:44:35,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=529326.6666666666, ans=0.125 2023-11-19 02:45:09,889 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2023-11-19 02:45:12,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=529526.6666666666, ans=0.125 2023-11-19 02:45:15,824 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7300, loss[loss=0.0717, simple_loss=0.08482, pruned_loss=0.01665, audio_tagging_loss=0.01263, over 15101.00 frames. ], tot_loss[loss=0.09422, simple_loss=0.1121, pruned_loss=0.02716, audio_tagging_loss=0.011, over 3033336.80 frames. ], batch size: 57, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:45:18,751 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2023-11-19 02:45:30,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=12.0 2023-11-19 02:45:31,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=529660.0, ans=0.2 2023-11-19 02:45:43,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529726.6666666666, ans=0.1 2023-11-19 02:45:51,672 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.930e+01 9.656e+01 1.070e+02 1.553e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-19 02:45:57,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=529793.3333333334, ans=0.125 2023-11-19 02:45:59,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=529860.0, ans=0.125 2023-11-19 02:46:07,971 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2023-11-19 02:46:08,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=529860.0, ans=0.0 2023-11-19 02:46:12,163 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7350, loss[loss=0.1057, simple_loss=0.1297, pruned_loss=0.03294, audio_tagging_loss=0.007874, over 14609.00 frames. ], tot_loss[loss=0.09392, simple_loss=0.1119, pruned_loss=0.02725, audio_tagging_loss=0.01074, over 3038135.86 frames. ], batch size: 53, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:47:07,075 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7400, loss[loss=0.1082, simple_loss=0.1348, pruned_loss=0.02756, audio_tagging_loss=0.01324, over 13936.00 frames. ], tot_loss[loss=0.0931, simple_loss=0.1111, pruned_loss=0.02694, audio_tagging_loss=0.01062, over 3038627.28 frames. ], batch size: 53, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:47:07,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=530260.0, ans=0.0 2023-11-19 02:47:15,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530260.0, ans=0.1 2023-11-19 02:47:23,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=530326.6666666666, ans=0.0 2023-11-19 02:47:30,010 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2023-11-19 02:47:32,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=530393.3333333334, ans=0.0 2023-11-19 02:47:41,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=530460.0, ans=0.1 2023-11-19 02:47:43,077 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.685e+01 9.299e+01 1.013e+02 1.325e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:48:02,732 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7450, loss[loss=0.1012, simple_loss=0.128, pruned_loss=0.0311, audio_tagging_loss=0.006121, over 15184.00 frames. ], tot_loss[loss=0.09251, simple_loss=0.1103, pruned_loss=0.02675, audio_tagging_loss=0.0106, over 3043441.56 frames. ], batch size: 57, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:48:06,325 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2023-11-19 02:48:08,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=22.5 2023-11-19 02:48:10,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2023-11-19 02:48:11,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.36 vs. limit=12.0 2023-11-19 02:48:16,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=530660.0, ans=0.125 2023-11-19 02:48:25,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=530726.6666666666, ans=0.125 2023-11-19 02:48:50,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=530860.0, ans=0.0 2023-11-19 02:48:59,269 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7500, loss[loss=0.0749, simple_loss=0.08357, pruned_loss=0.02178, audio_tagging_loss=0.01134, over 14184.00 frames. ], tot_loss[loss=0.09241, simple_loss=0.1101, pruned_loss=0.02681, audio_tagging_loss=0.01056, over 3045086.59 frames. ], batch size: 54, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:49:00,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=530926.6666666666, ans=0.0 2023-11-19 02:49:03,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=530926.6666666666, ans=0.0 2023-11-19 02:49:17,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=12.0 2023-11-19 02:49:30,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=531126.6666666666, ans=0.0 2023-11-19 02:49:33,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.748e+01 9.301e+01 1.047e+02 1.348e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:49:53,801 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7550, loss[loss=0.09147, simple_loss=0.109, pruned_loss=0.02879, audio_tagging_loss=0.008177, over 16100.00 frames. ], tot_loss[loss=0.09314, simple_loss=0.1111, pruned_loss=0.02719, audio_tagging_loss=0.01042, over 3050205.79 frames. ], batch size: 60, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:50:00,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=531260.0, ans=0.1 2023-11-19 02:50:09,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2023-11-19 02:50:16,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=531393.3333333334, ans=10.0 2023-11-19 02:50:17,705 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.70 vs. limit=15.0 2023-11-19 02:50:48,814 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7600, loss[loss=0.07049, simple_loss=0.07964, pruned_loss=0.01951, audio_tagging_loss=0.01117, over 14775.00 frames. ], tot_loss[loss=0.09312, simple_loss=0.1109, pruned_loss=0.02711, audio_tagging_loss=0.01054, over 3052718.22 frames. ], batch size: 56, lr: 9.77e-03, grad_scale: 32.0 2023-11-19 02:50:53,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=12.0 2023-11-19 02:51:02,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=531660.0, ans=0.125 2023-11-19 02:51:06,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=531660.0, ans=0.125 2023-11-19 02:51:07,595 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=15.0 2023-11-19 02:51:10,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=531660.0, ans=0.125 2023-11-19 02:51:14,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=531726.6666666666, ans=0.0 2023-11-19 02:51:16,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=531726.6666666666, ans=0.2 2023-11-19 02:51:24,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.650e+01 9.572e+01 1.070e+02 1.390e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-19 02:51:45,438 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7650, loss[loss=0.08212, simple_loss=0.1001, pruned_loss=0.02263, audio_tagging_loss=0.009428, over 15252.00 frames. ], tot_loss[loss=0.09311, simple_loss=0.111, pruned_loss=0.0271, audio_tagging_loss=0.01053, over 3050193.76 frames. ], batch size: 55, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:52:00,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=531993.3333333334, ans=0.0 2023-11-19 02:52:03,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=531993.3333333334, ans=0.2 2023-11-19 02:52:06,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=532060.0, ans=0.0 2023-11-19 02:52:06,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532060.0, ans=0.1 2023-11-19 02:52:20,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=532126.6666666666, ans=0.0 2023-11-19 02:52:23,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532126.6666666666, ans=0.1 2023-11-19 02:52:41,030 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7700, loss[loss=0.1011, simple_loss=0.1077, pruned_loss=0.03675, audio_tagging_loss=0.01053, over 15939.00 frames. ], tot_loss[loss=0.09322, simple_loss=0.1109, pruned_loss=0.02714, audio_tagging_loss=0.01065, over 3053443.58 frames. ], batch size: 60, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:52:51,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=532326.6666666666, ans=0.125 2023-11-19 02:53:00,280 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:53:15,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=532460.0, ans=0.2 2023-11-19 02:53:17,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=532460.0, ans=0.0 2023-11-19 02:53:17,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.485e+01 9.381e+01 1.068e+02 1.739e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 02:53:35,820 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7750, loss[loss=0.1035, simple_loss=0.1167, pruned_loss=0.03358, audio_tagging_loss=0.01153, over 16178.00 frames. ], tot_loss[loss=0.09382, simple_loss=0.1116, pruned_loss=0.02725, audio_tagging_loss=0.01076, over 3055157.79 frames. ], batch size: 60, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:53:44,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=532593.3333333334, ans=0.04949747468305833 2023-11-19 02:53:49,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=532660.0, ans=0.1 2023-11-19 02:53:52,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532660.0, ans=0.1 2023-11-19 02:53:54,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=532660.0, ans=0.0 2023-11-19 02:54:05,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532726.6666666666, ans=0.1 2023-11-19 02:54:15,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=532793.3333333334, ans=0.0 2023-11-19 02:54:31,614 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7800, loss[loss=0.108, simple_loss=0.1247, pruned_loss=0.03192, audio_tagging_loss=0.01373, over 15750.00 frames. ], tot_loss[loss=0.094, simple_loss=0.1118, pruned_loss=0.02728, audio_tagging_loss=0.01083, over 3059896.30 frames. ], batch size: 57, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:55:07,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.565e+01 9.591e+01 1.072e+02 1.947e+02, threshold=1.918e+02, percent-clipped=1.0 2023-11-19 02:55:27,527 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7850, loss[loss=0.08401, simple_loss=0.1071, pruned_loss=0.01891, audio_tagging_loss=0.01156, over 15999.00 frames. ], tot_loss[loss=0.09446, simple_loss=0.1122, pruned_loss=0.02753, audio_tagging_loss=0.01085, over 3056628.77 frames. ], batch size: 59, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:55:37,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=533326.6666666666, ans=0.0 2023-11-19 02:55:54,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=533393.3333333334, ans=0.1 2023-11-19 02:56:07,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=533460.0, ans=0.1 2023-11-19 02:56:10,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=533460.0, ans=0.035 2023-11-19 02:56:12,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=533460.0, ans=0.2 2023-11-19 02:56:24,748 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7900, loss[loss=0.06225, simple_loss=0.07163, pruned_loss=0.01438, audio_tagging_loss=0.01206, over 14989.00 frames. ], tot_loss[loss=0.09463, simple_loss=0.1123, pruned_loss=0.02764, audio_tagging_loss=0.01084, over 3056283.47 frames. ], batch size: 58, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:56:25,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=533593.3333333334, ans=0.2 2023-11-19 02:56:25,243 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.12 vs. limit=15.0 2023-11-19 02:56:29,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=533593.3333333334, ans=0.125 2023-11-19 02:56:30,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=533593.3333333334, ans=0.2 2023-11-19 02:56:33,643 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.92 vs. limit=15.0 2023-11-19 02:56:37,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=533660.0, ans=0.0 2023-11-19 02:57:01,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.512e+01 9.298e+01 1.008e+02 1.380e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:57:18,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=533860.0, ans=0.2 2023-11-19 02:57:19,987 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 7950, loss[loss=0.09191, simple_loss=0.1094, pruned_loss=0.02781, audio_tagging_loss=0.009373, over 15216.00 frames. ], tot_loss[loss=0.09392, simple_loss=0.1112, pruned_loss=0.02734, audio_tagging_loss=0.01098, over 3052246.67 frames. ], batch size: 56, lr: 9.75e-03, grad_scale: 16.0 2023-11-19 02:57:34,938 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:57:37,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-11-19 02:57:41,735 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.14 vs. limit=6.0 2023-11-19 02:57:42,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=534060.0, ans=0.0 2023-11-19 02:57:46,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.49 vs. limit=15.0 2023-11-19 02:58:08,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=534193.3333333334, ans=0.125 2023-11-19 02:58:16,030 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8000, loss[loss=0.08934, simple_loss=0.107, pruned_loss=0.02594, audio_tagging_loss=0.009901, over 13921.00 frames. ], tot_loss[loss=0.09361, simple_loss=0.1106, pruned_loss=0.02723, audio_tagging_loss=0.01107, over 3036064.91 frames. ], batch size: 55, lr: 9.75e-03, grad_scale: 32.0 2023-11-19 02:58:29,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=534326.6666666666, ans=0.2 2023-11-19 02:58:37,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=534393.3333333334, ans=0.95 2023-11-19 02:58:42,589 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2023-11-19 02:58:52,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.488e+01 9.029e+01 9.898e+01 1.404e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 02:58:57,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534460.0, ans=0.1 2023-11-19 02:58:57,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=534460.0, ans=0.025 2023-11-19 02:59:10,641 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8050, loss[loss=0.1138, simple_loss=0.1361, pruned_loss=0.0371, audio_tagging_loss=0.008681, over 14667.00 frames. ], tot_loss[loss=0.09387, simple_loss=0.1109, pruned_loss=0.02728, audio_tagging_loss=0.01116, over 3031544.70 frames. ], batch size: 55, lr: 9.75e-03, grad_scale: 32.0 2023-11-19 02:59:11,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=534593.3333333334, ans=0.125 2023-11-19 02:59:19,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=534593.3333333334, ans=0.1 2023-11-19 02:59:34,047 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=15.0 2023-11-19 02:59:35,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=534726.6666666666, ans=0.125 2023-11-19 03:00:06,532 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8100, loss[loss=0.09841, simple_loss=0.1248, pruned_loss=0.02677, audio_tagging_loss=0.009229, over 15962.00 frames. ], tot_loss[loss=0.09461, simple_loss=0.1118, pruned_loss=0.02773, audio_tagging_loss=0.01096, over 3042360.75 frames. ], batch size: 56, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:00:08,161 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.50 vs. limit=6.0 2023-11-19 03:00:15,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2023-11-19 03:00:16,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=534926.6666666666, ans=0.0 2023-11-19 03:00:42,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=535126.6666666666, ans=0.125 2023-11-19 03:00:42,976 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.821e+01 9.637e+01 1.043e+02 1.464e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-19 03:00:48,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=535126.6666666666, ans=0.125 2023-11-19 03:01:01,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=535260.0, ans=0.025 2023-11-19 03:01:02,686 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8150, loss[loss=0.06767, simple_loss=0.07966, pruned_loss=0.01419, audio_tagging_loss=0.01365, over 15626.00 frames. ], tot_loss[loss=0.09376, simple_loss=0.1111, pruned_loss=0.02745, audio_tagging_loss=0.01074, over 3046982.83 frames. ], batch size: 58, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:01:16,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=535326.6666666666, ans=0.125 2023-11-19 03:01:18,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=12.0 2023-11-19 03:01:22,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=535326.6666666666, ans=0.125 2023-11-19 03:01:32,254 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=535393.3333333334, ans=0.035 2023-11-19 03:01:34,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=535460.0, ans=10.0 2023-11-19 03:01:40,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=535460.0, ans=0.2 2023-11-19 03:01:55,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=535526.6666666666, ans=0.0 2023-11-19 03:01:55,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=535526.6666666666, ans=0.125 2023-11-19 03:01:56,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535526.6666666666, ans=0.1 2023-11-19 03:01:57,924 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8200, loss[loss=0.07487, simple_loss=0.08105, pruned_loss=0.02121, audio_tagging_loss=0.01314, over 14704.00 frames. ], tot_loss[loss=0.09358, simple_loss=0.1113, pruned_loss=0.02726, audio_tagging_loss=0.01069, over 3043808.25 frames. ], batch size: 57, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:02:00,013 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:02:07,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=535660.0, ans=0.2 2023-11-19 03:02:34,968 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.718e+01 8.630e+01 9.276e+01 1.032e+02 1.538e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 03:02:43,945 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.20 vs. limit=15.0 2023-11-19 03:02:53,479 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8250, loss[loss=0.1105, simple_loss=0.1345, pruned_loss=0.03322, audio_tagging_loss=0.009969, over 14906.00 frames. ], tot_loss[loss=0.0928, simple_loss=0.1102, pruned_loss=0.02696, audio_tagging_loss=0.01071, over 3049279.20 frames. ], batch size: 54, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:03:12,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=535993.3333333334, ans=0.125 2023-11-19 03:03:36,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=536126.6666666666, ans=0.125 2023-11-19 03:03:38,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.29 vs. limit=22.5 2023-11-19 03:03:39,662 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=12.0 2023-11-19 03:03:45,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=536193.3333333334, ans=0.0 2023-11-19 03:03:49,081 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:03:49,901 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8300, loss[loss=0.07735, simple_loss=0.09567, pruned_loss=0.01966, audio_tagging_loss=0.00986, over 15379.00 frames. ], tot_loss[loss=0.09381, simple_loss=0.1118, pruned_loss=0.02728, audio_tagging_loss=0.01064, over 3052136.23 frames. ], batch size: 57, lr: 9.73e-03, grad_scale: 32.0 2023-11-19 03:03:56,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-19 03:04:04,310 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=536326.6666666666, ans=0.125 2023-11-19 03:04:10,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=536393.3333333334, ans=0.0 2023-11-19 03:04:16,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-11-19 03:04:27,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.802e+01 9.688e+01 1.089e+02 1.659e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-19 03:04:41,105 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2023-11-19 03:04:43,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=536526.6666666666, ans=0.125 2023-11-19 03:04:45,392 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8350, loss[loss=0.08888, simple_loss=0.1088, pruned_loss=0.02501, audio_tagging_loss=0.009455, over 15223.00 frames. ], tot_loss[loss=0.0936, simple_loss=0.1115, pruned_loss=0.02722, audio_tagging_loss=0.01061, over 3058980.77 frames. ], batch size: 56, lr: 9.73e-03, grad_scale: 16.0 2023-11-19 03:04:58,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=536660.0, ans=0.125 2023-11-19 03:05:04,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=536660.0, ans=0.1 2023-11-19 03:05:07,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2023-11-19 03:05:19,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=536793.3333333334, ans=0.2 2023-11-19 03:05:22,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=536793.3333333334, ans=0.025 2023-11-19 03:05:28,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=536793.3333333334, ans=0.125 2023-11-19 03:05:29,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=536860.0, ans=0.125 2023-11-19 03:05:37,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-19 03:05:40,346 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8400, loss[loss=0.0984, simple_loss=0.1232, pruned_loss=0.02806, audio_tagging_loss=0.008731, over 15676.00 frames. ], tot_loss[loss=0.09261, simple_loss=0.1107, pruned_loss=0.02677, audio_tagging_loss=0.01048, over 3051632.40 frames. ], batch size: 57, lr: 9.73e-03, grad_scale: 32.0 2023-11-19 03:05:43,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=536926.6666666666, ans=0.025 2023-11-19 03:06:04,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=537060.0, ans=0.125 2023-11-19 03:06:11,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=537060.0, ans=0.125 2023-11-19 03:06:15,474 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:06:18,483 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.679e+01 9.349e+01 1.017e+02 2.307e+02, threshold=1.870e+02, percent-clipped=1.0 2023-11-19 03:06:31,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=537193.3333333334, ans=0.125 2023-11-19 03:06:36,880 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8450, loss[loss=0.06808, simple_loss=0.08062, pruned_loss=0.0168, audio_tagging_loss=0.01097, over 15008.00 frames. ], tot_loss[loss=0.09232, simple_loss=0.1098, pruned_loss=0.02676, audio_tagging_loss=0.01066, over 3055798.09 frames. ], batch size: 57, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:06:43,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=537260.0, ans=0.125 2023-11-19 03:06:47,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=537326.6666666666, ans=0.09899494936611666 2023-11-19 03:07:16,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537460.0, ans=0.1 2023-11-19 03:07:31,470 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8500, loss[loss=0.08972, simple_loss=0.1077, pruned_loss=0.02462, audio_tagging_loss=0.01123, over 14134.00 frames. ], tot_loss[loss=0.09266, simple_loss=0.1103, pruned_loss=0.02693, audio_tagging_loss=0.0106, over 3056235.94 frames. ], batch size: 53, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:07:41,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=15.0 2023-11-19 03:07:53,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2023-11-19 03:08:00,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=537726.6666666666, ans=0.125 2023-11-19 03:08:09,031 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.768e+01 9.309e+01 1.039e+02 1.379e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 03:08:12,460 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:08:18,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-19 03:08:19,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=537860.0, ans=0.0 2023-11-19 03:08:26,576 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8550, loss[loss=0.09154, simple_loss=0.106, pruned_loss=0.02558, audio_tagging_loss=0.01294, over 15004.00 frames. ], tot_loss[loss=0.09321, simple_loss=0.1108, pruned_loss=0.02704, audio_tagging_loss=0.01075, over 3053683.22 frames. ], batch size: 56, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:08:34,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=537926.6666666666, ans=0.125 2023-11-19 03:08:43,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=537993.3333333334, ans=0.125 2023-11-19 03:08:50,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=538060.0, ans=0.2 2023-11-19 03:09:07,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=538126.6666666666, ans=0.0 2023-11-19 03:09:11,093 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=538193.3333333334, ans=0.125 2023-11-19 03:09:22,953 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8600, loss[loss=0.104, simple_loss=0.1343, pruned_loss=0.02808, audio_tagging_loss=0.008704, over 15595.00 frames. ], tot_loss[loss=0.09282, simple_loss=0.11, pruned_loss=0.02692, audio_tagging_loss=0.0109, over 3047988.96 frames. ], batch size: 58, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:09:23,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=15.0 2023-11-19 03:09:53,140 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538393.3333333334, ans=0.1 2023-11-19 03:09:59,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.913e+01 9.582e+01 1.068e+02 1.371e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-19 03:10:05,329 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=538460.0, ans=0.125 2023-11-19 03:10:07,644 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.49 vs. limit=15.0 2023-11-19 03:10:17,868 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8650, loss[loss=0.1081, simple_loss=0.1228, pruned_loss=0.03875, audio_tagging_loss=0.007926, over 14718.00 frames. ], tot_loss[loss=0.09283, simple_loss=0.1101, pruned_loss=0.02688, audio_tagging_loss=0.01091, over 3043638.74 frames. ], batch size: 56, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:10:22,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=538593.3333333334, ans=0.125 2023-11-19 03:10:30,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=538660.0, ans=0.125 2023-11-19 03:10:40,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.89 vs. limit=22.5 2023-11-19 03:10:42,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538726.6666666666, ans=0.1 2023-11-19 03:10:53,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=538793.3333333334, ans=0.125 2023-11-19 03:11:13,649 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8700, loss[loss=0.07806, simple_loss=0.0866, pruned_loss=0.02023, audio_tagging_loss=0.01454, over 14202.00 frames. ], tot_loss[loss=0.09264, simple_loss=0.1096, pruned_loss=0.02679, audio_tagging_loss=0.01105, over 3052253.52 frames. ], batch size: 55, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:11:27,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=538993.3333333334, ans=0.025 2023-11-19 03:11:33,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=538993.3333333334, ans=0.2 2023-11-19 03:11:44,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=539060.0, ans=0.125 2023-11-19 03:11:45,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=539126.6666666666, ans=0.0 2023-11-19 03:11:50,664 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.784e+01 9.683e+01 1.064e+02 1.511e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-19 03:11:57,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539193.3333333334, ans=0.1 2023-11-19 03:12:09,104 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8750, loss[loss=0.102, simple_loss=0.1227, pruned_loss=0.02803, audio_tagging_loss=0.01265, over 15503.00 frames. ], tot_loss[loss=0.09316, simple_loss=0.1102, pruned_loss=0.02699, audio_tagging_loss=0.01108, over 3049489.99 frames. ], batch size: 58, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:12:17,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=539260.0, ans=0.07 2023-11-19 03:12:19,683 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2023-11-19 03:12:22,779 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.85 vs. limit=15.0 2023-11-19 03:12:30,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=539393.3333333334, ans=0.0 2023-11-19 03:12:32,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=539393.3333333334, ans=0.125 2023-11-19 03:12:35,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=539393.3333333334, ans=0.1 2023-11-19 03:12:35,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-19 03:12:54,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=539526.6666666666, ans=0.125 2023-11-19 03:13:04,385 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8800, loss[loss=0.06191, simple_loss=0.06711, pruned_loss=0.01399, audio_tagging_loss=0.01436, over 13902.00 frames. ], tot_loss[loss=0.09449, simple_loss=0.112, pruned_loss=0.0275, audio_tagging_loss=0.011, over 3047925.49 frames. ], batch size: 54, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:13:11,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=539593.3333333334, ans=0.125 2023-11-19 03:13:12,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=539593.3333333334, ans=0.125 2023-11-19 03:13:30,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539726.6666666666, ans=0.1 2023-11-19 03:13:40,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=539793.3333333334, ans=0.0 2023-11-19 03:13:42,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.511e+01 8.574e+01 9.508e+01 1.041e+02 1.765e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:13:46,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=539793.3333333334, ans=0.05 2023-11-19 03:13:52,380 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=539860.0, ans=0.0 2023-11-19 03:13:54,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=539860.0, ans=0.125 2023-11-19 03:13:56,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=12.0 2023-11-19 03:13:59,451 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8850, loss[loss=0.08345, simple_loss=0.08622, pruned_loss=0.02717, audio_tagging_loss=0.01317, over 15071.00 frames. ], tot_loss[loss=0.0949, simple_loss=0.1124, pruned_loss=0.02763, audio_tagging_loss=0.01105, over 3052278.64 frames. ], batch size: 58, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:14:12,419 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:14:19,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=539993.3333333334, ans=0.1 2023-11-19 03:14:21,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=539993.3333333334, ans=0.125 2023-11-19 03:14:32,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=540126.6666666666, ans=0.05 2023-11-19 03:14:34,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=540126.6666666666, ans=0.125 2023-11-19 03:14:45,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=540193.3333333334, ans=0.125 2023-11-19 03:14:46,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=540193.3333333334, ans=0.125 2023-11-19 03:14:48,989 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-19 03:14:51,209 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=540193.3333333334, ans=0.0 2023-11-19 03:14:52,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=540193.3333333334, ans=0.125 2023-11-19 03:14:55,149 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8900, loss[loss=0.09424, simple_loss=0.1109, pruned_loss=0.02882, audio_tagging_loss=0.009979, over 14757.00 frames. ], tot_loss[loss=0.09442, simple_loss=0.1124, pruned_loss=0.02738, audio_tagging_loss=0.01085, over 3058221.55 frames. ], batch size: 55, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:15:22,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=540393.3333333334, ans=0.1 2023-11-19 03:15:30,704 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=540460.0, ans=0.125 2023-11-19 03:15:32,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.732e+01 9.510e+01 1.041e+02 1.883e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:15:41,801 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2023-11-19 03:15:50,760 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 8950, loss[loss=0.08327, simple_loss=0.08585, pruned_loss=0.02362, audio_tagging_loss=0.01673, over 15720.00 frames. ], tot_loss[loss=0.09454, simple_loss=0.1127, pruned_loss=0.02749, audio_tagging_loss=0.01071, over 3066537.11 frames. ], batch size: 60, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:15:51,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=540593.3333333334, ans=0.1 2023-11-19 03:15:52,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=540593.3333333334, ans=0.125 2023-11-19 03:15:55,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=540593.3333333334, ans=0.0 2023-11-19 03:16:09,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=540660.0, ans=0.09899494936611666 2023-11-19 03:16:09,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=540660.0, ans=15.0 2023-11-19 03:16:13,628 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2023-11-19 03:16:15,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=540726.6666666666, ans=0.2 2023-11-19 03:16:32,311 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:16:39,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=540860.0, ans=0.0 2023-11-19 03:16:45,767 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9000, loss[loss=0.1075, simple_loss=0.128, pruned_loss=0.03517, audio_tagging_loss=0.008369, over 15955.00 frames. ], tot_loss[loss=0.09443, simple_loss=0.1125, pruned_loss=0.02757, audio_tagging_loss=0.01061, over 3067388.63 frames. ], batch size: 60, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:16:45,767 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 03:17:07,316 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8723, 5.8645, 5.8258, 5.5066], device='cuda:3') 2023-11-19 03:17:18,025 INFO [train_asr.py:1147] (3/4) Epoch 7, validation: loss=0.06875, simple_loss=0.05761, pruned_loss=0.007498, audio_tagging_loss=0.03244, over 4681554.00 frames. 2023-11-19 03:17:18,026 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 03:17:24,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=540926.6666666666, ans=0.2 2023-11-19 03:17:29,991 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.93 vs. limit=22.5 2023-11-19 03:17:36,227 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2023-11-19 03:17:39,193 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.875e-01 2023-11-19 03:17:45,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541060.0, ans=0.1 2023-11-19 03:17:54,257 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.732e+01 9.313e+01 1.034e+02 1.719e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 03:18:03,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=541193.3333333334, ans=0.125 2023-11-19 03:18:05,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2023-11-19 03:18:06,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=541193.3333333334, ans=0.2 2023-11-19 03:18:12,246 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9050, loss[loss=0.07445, simple_loss=0.08643, pruned_loss=0.01912, audio_tagging_loss=0.01212, over 14535.00 frames. ], tot_loss[loss=0.09356, simple_loss=0.1111, pruned_loss=0.02737, audio_tagging_loss=0.01066, over 3055112.65 frames. ], batch size: 56, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:18:14,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541260.0, ans=0.1 2023-11-19 03:18:20,572 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-11-19 03:18:24,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541326.6666666666, ans=0.1 2023-11-19 03:18:49,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541460.0, ans=0.1 2023-11-19 03:18:52,167 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.66 vs. limit=15.0 2023-11-19 03:19:07,361 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9100, loss[loss=0.09034, simple_loss=0.1037, pruned_loss=0.02793, audio_tagging_loss=0.01059, over 15133.00 frames. ], tot_loss[loss=0.09325, simple_loss=0.1108, pruned_loss=0.02714, audio_tagging_loss=0.01071, over 3055890.12 frames. ], batch size: 58, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:19:33,674 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=541726.6666666666, ans=0.125 2023-11-19 03:19:39,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.38 vs. limit=15.0 2023-11-19 03:19:41,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=541793.3333333334, ans=0.0 2023-11-19 03:19:44,939 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.588e+01 9.392e+01 1.039e+02 1.289e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 03:19:50,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541860.0, ans=0.1 2023-11-19 03:19:50,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2023-11-19 03:20:02,265 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9150, loss[loss=0.06878, simple_loss=0.08065, pruned_loss=0.01681, audio_tagging_loss=0.01165, over 15014.00 frames. ], tot_loss[loss=0.09265, simple_loss=0.1101, pruned_loss=0.02687, audio_tagging_loss=0.01075, over 3051677.36 frames. ], batch size: 59, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:20:07,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=541926.6666666666, ans=0.0 2023-11-19 03:20:10,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=541926.6666666666, ans=0.2 2023-11-19 03:20:14,468 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.49 vs. limit=10.0 2023-11-19 03:20:22,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2023-11-19 03:20:32,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=542060.0, ans=0.0 2023-11-19 03:20:36,052 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=15.68 vs. limit=15.0 2023-11-19 03:20:57,894 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9200, loss[loss=0.0919, simple_loss=0.1053, pruned_loss=0.02646, audio_tagging_loss=0.01281, over 15765.00 frames. ], tot_loss[loss=0.09173, simple_loss=0.1086, pruned_loss=0.02666, audio_tagging_loss=0.01076, over 3045536.85 frames. ], batch size: 58, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:21:28,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=542393.3333333334, ans=0.125 2023-11-19 03:21:36,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.578e+01 9.333e+01 1.009e+02 1.862e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 03:21:50,078 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:21:52,038 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9250, loss[loss=0.08092, simple_loss=0.1038, pruned_loss=0.01718, audio_tagging_loss=0.01183, over 15877.00 frames. ], tot_loss[loss=0.09225, simple_loss=0.1094, pruned_loss=0.02679, audio_tagging_loss=0.01076, over 3040302.54 frames. ], batch size: 60, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:21:55,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=542593.3333333334, ans=0.125 2023-11-19 03:21:57,557 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=542593.3333333334, ans=0.125 2023-11-19 03:22:10,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=542660.0, ans=0.0 2023-11-19 03:22:20,877 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.53 vs. limit=15.0 2023-11-19 03:22:24,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=542793.3333333334, ans=0.125 2023-11-19 03:22:24,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=542793.3333333334, ans=0.2 2023-11-19 03:22:37,674 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-11-19 03:22:47,236 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9300, loss[loss=0.08721, simple_loss=0.1051, pruned_loss=0.02525, audio_tagging_loss=0.009417, over 15909.00 frames. ], tot_loss[loss=0.09227, simple_loss=0.1093, pruned_loss=0.02677, audio_tagging_loss=0.01086, over 3037344.02 frames. ], batch size: 57, lr: 9.67e-03, grad_scale: 32.0 2023-11-19 03:22:56,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=542926.6666666666, ans=0.125 2023-11-19 03:23:02,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=542993.3333333334, ans=0.0 2023-11-19 03:23:08,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=543060.0, ans=0.0 2023-11-19 03:23:12,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=543060.0, ans=0.125 2023-11-19 03:23:14,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=543060.0, ans=0.125 2023-11-19 03:23:26,430 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.462e+01 9.179e+01 9.907e+01 1.156e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 03:23:39,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2023-11-19 03:23:42,872 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9350, loss[loss=0.09335, simple_loss=0.103, pruned_loss=0.03106, audio_tagging_loss=0.01078, over 14553.00 frames. ], tot_loss[loss=0.0919, simple_loss=0.1088, pruned_loss=0.02661, audio_tagging_loss=0.01091, over 3039947.16 frames. ], batch size: 55, lr: 9.67e-03, grad_scale: 16.0 2023-11-19 03:23:50,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=543260.0, ans=0.0 2023-11-19 03:23:53,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=543326.6666666666, ans=0.0 2023-11-19 03:23:54,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-19 03:24:00,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-19 03:24:12,685 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:24:16,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=543460.0, ans=0.0 2023-11-19 03:24:23,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=543460.0, ans=0.0 2023-11-19 03:24:37,154 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9400, loss[loss=0.0674, simple_loss=0.07517, pruned_loss=0.01595, audio_tagging_loss=0.01387, over 15425.00 frames. ], tot_loss[loss=0.09193, simple_loss=0.1086, pruned_loss=0.02669, audio_tagging_loss=0.01094, over 3047394.53 frames. ], batch size: 58, lr: 9.67e-03, grad_scale: 16.0 2023-11-19 03:24:43,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=543593.3333333334, ans=0.125 2023-11-19 03:25:03,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=543726.6666666666, ans=0.0 2023-11-19 03:25:11,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=543793.3333333334, ans=0.0 2023-11-19 03:25:15,924 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:25:16,704 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.406e+01 9.126e+01 1.047e+02 1.267e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 03:25:31,571 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9450, loss[loss=0.08327, simple_loss=0.1021, pruned_loss=0.02248, audio_tagging_loss=0.00975, over 15428.00 frames. ], tot_loss[loss=0.09271, simple_loss=0.1095, pruned_loss=0.02692, audio_tagging_loss=0.01106, over 3046657.46 frames. ], batch size: 58, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:25:31,580 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:25:39,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=543926.6666666666, ans=0.125 2023-11-19 03:25:43,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=543993.3333333334, ans=0.2 2023-11-19 03:26:28,038 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9500, loss[loss=0.1018, simple_loss=0.124, pruned_loss=0.02784, audio_tagging_loss=0.01198, over 15300.00 frames. ], tot_loss[loss=0.09262, simple_loss=0.1092, pruned_loss=0.02681, audio_tagging_loss=0.0112, over 3043164.72 frames. ], batch size: 58, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:26:37,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=544260.0, ans=0.07 2023-11-19 03:26:53,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=544393.3333333334, ans=0.0 2023-11-19 03:26:55,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-19 03:27:03,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=544460.0, ans=0.0 2023-11-19 03:27:08,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.649e+01 9.463e+01 1.058e+02 1.966e+02, threshold=1.893e+02, percent-clipped=1.0 2023-11-19 03:27:10,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544460.0, ans=0.1 2023-11-19 03:27:23,680 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9550, loss[loss=0.0926, simple_loss=0.1166, pruned_loss=0.02573, audio_tagging_loss=0.008557, over 14845.00 frames. ], tot_loss[loss=0.09324, simple_loss=0.1098, pruned_loss=0.02709, audio_tagging_loss=0.01124, over 3044282.03 frames. ], batch size: 55, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:27:27,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=544593.3333333334, ans=0.125 2023-11-19 03:27:28,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=544593.3333333334, ans=0.2 2023-11-19 03:27:33,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=544660.0, ans=0.125 2023-11-19 03:27:47,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=544726.6666666666, ans=0.0 2023-11-19 03:28:09,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=544860.0, ans=0.0 2023-11-19 03:28:17,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=544926.6666666666, ans=0.1 2023-11-19 03:28:18,803 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9600, loss[loss=0.08257, simple_loss=0.1011, pruned_loss=0.02192, audio_tagging_loss=0.01009, over 14112.00 frames. ], tot_loss[loss=0.09308, simple_loss=0.11, pruned_loss=0.02692, audio_tagging_loss=0.01115, over 3046602.29 frames. ], batch size: 57, lr: 9.66e-03, grad_scale: 32.0 2023-11-19 03:28:30,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=544993.3333333334, ans=0.07 2023-11-19 03:28:34,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.22 vs. limit=10.0 2023-11-19 03:28:59,135 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.431e+01 9.173e+01 1.006e+02 1.337e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 03:29:15,057 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9650, loss[loss=0.09357, simple_loss=0.1197, pruned_loss=0.02481, audio_tagging_loss=0.008913, over 15159.00 frames. ], tot_loss[loss=0.09304, simple_loss=0.1101, pruned_loss=0.02685, audio_tagging_loss=0.01113, over 3047477.03 frames. ], batch size: 57, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:29:15,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=22.5 2023-11-19 03:29:19,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=545260.0, ans=0.1 2023-11-19 03:29:22,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=545260.0, ans=0.125 2023-11-19 03:29:36,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2023-11-19 03:30:10,038 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9700, loss[loss=0.07619, simple_loss=0.08617, pruned_loss=0.02312, audio_tagging_loss=0.00998, over 14920.00 frames. ], tot_loss[loss=0.09422, simple_loss=0.1119, pruned_loss=0.02749, audio_tagging_loss=0.01078, over 3051611.63 frames. ], batch size: 58, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:30:50,518 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 8.564e+01 9.508e+01 1.033e+02 1.418e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:31:05,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2023-11-19 03:31:05,787 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9750, loss[loss=0.09748, simple_loss=0.1108, pruned_loss=0.0301, audio_tagging_loss=0.01195, over 15582.00 frames. ], tot_loss[loss=0.09379, simple_loss=0.1115, pruned_loss=0.02738, audio_tagging_loss=0.01064, over 3048459.73 frames. ], batch size: 58, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:31:35,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=546060.0, ans=0.2 2023-11-19 03:31:50,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=546193.3333333334, ans=0.125 2023-11-19 03:32:02,945 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9800, loss[loss=0.09919, simple_loss=0.1153, pruned_loss=0.02978, audio_tagging_loss=0.01177, over 15268.00 frames. ], tot_loss[loss=0.0926, simple_loss=0.1101, pruned_loss=0.02694, audio_tagging_loss=0.01059, over 3047658.86 frames. ], batch size: 59, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:32:11,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=546260.0, ans=0.0 2023-11-19 03:32:21,227 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=546326.6666666666, ans=0.2 2023-11-19 03:32:34,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=546460.0, ans=0.025 2023-11-19 03:32:38,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=546460.0, ans=0.125 2023-11-19 03:32:43,158 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.602e+01 9.393e+01 1.096e+02 1.685e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 03:32:52,701 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:32:57,949 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9850, loss[loss=0.06643, simple_loss=0.08121, pruned_loss=0.01438, audio_tagging_loss=0.01145, over 14960.00 frames. ], tot_loss[loss=0.09254, simple_loss=0.1099, pruned_loss=0.02688, audio_tagging_loss=0.01071, over 3047324.11 frames. ], batch size: 58, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:33:05,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=546593.3333333334, ans=0.125 2023-11-19 03:33:19,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546726.6666666666, ans=0.1 2023-11-19 03:33:21,848 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2023-11-19 03:33:27,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=546726.6666666666, ans=0.0 2023-11-19 03:33:34,934 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2023-11-19 03:33:41,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=546860.0, ans=0.125 2023-11-19 03:33:41,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=546860.0, ans=0.125 2023-11-19 03:33:42,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=546860.0, ans=0.2 2023-11-19 03:33:44,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-11-19 03:33:49,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=546860.0, ans=0.1 2023-11-19 03:33:53,976 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9900, loss[loss=0.08951, simple_loss=0.1038, pruned_loss=0.02571, audio_tagging_loss=0.01188, over 15169.00 frames. ], tot_loss[loss=0.09283, simple_loss=0.1103, pruned_loss=0.02689, audio_tagging_loss=0.01079, over 3045853.21 frames. ], batch size: 59, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:33:56,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=546926.6666666666, ans=0.0 2023-11-19 03:34:00,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=546926.6666666666, ans=0.125 2023-11-19 03:34:11,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=546993.3333333334, ans=0.125 2023-11-19 03:34:23,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=547060.0, ans=0.1 2023-11-19 03:34:24,320 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547060.0, ans=0.1 2023-11-19 03:34:34,603 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.700e+01 9.311e+01 1.023e+02 1.421e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 03:34:38,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547193.3333333334, ans=0.1 2023-11-19 03:34:50,610 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 9950, loss[loss=0.1281, simple_loss=0.1546, pruned_loss=0.04217, audio_tagging_loss=0.008651, over 15489.00 frames. ], tot_loss[loss=0.09274, simple_loss=0.1102, pruned_loss=0.02689, audio_tagging_loss=0.01075, over 3051112.14 frames. ], batch size: 59, lr: 9.64e-03, grad_scale: 16.0 2023-11-19 03:35:32,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=547460.0, ans=0.07 2023-11-19 03:35:39,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=547526.6666666666, ans=0.0 2023-11-19 03:35:42,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547526.6666666666, ans=0.1 2023-11-19 03:35:45,522 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10000, loss[loss=0.09297, simple_loss=0.1125, pruned_loss=0.02401, audio_tagging_loss=0.01269, over 15568.00 frames. ], tot_loss[loss=0.09269, simple_loss=0.1103, pruned_loss=0.02679, audio_tagging_loss=0.01073, over 3056937.04 frames. ], batch size: 56, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:35:47,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=547593.3333333334, ans=0.2 2023-11-19 03:35:52,439 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.47 vs. limit=15.0 2023-11-19 03:35:56,347 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547660.0, ans=0.1 2023-11-19 03:36:06,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=547726.6666666666, ans=0.125 2023-11-19 03:36:13,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=547726.6666666666, ans=0.125 2023-11-19 03:36:16,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=547726.6666666666, ans=0.0 2023-11-19 03:36:26,884 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.651e+01 9.520e+01 1.034e+02 1.455e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 03:36:30,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=547860.0, ans=0.125 2023-11-19 03:36:40,560 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10050, loss[loss=0.1044, simple_loss=0.126, pruned_loss=0.03225, audio_tagging_loss=0.009164, over 15808.00 frames. ], tot_loss[loss=0.09257, simple_loss=0.1103, pruned_loss=0.02669, audio_tagging_loss=0.01074, over 3058919.06 frames. ], batch size: 57, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:36:50,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=547926.6666666666, ans=0.0 2023-11-19 03:36:57,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=547993.3333333334, ans=0.0 2023-11-19 03:37:02,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=547993.3333333334, ans=0.125 2023-11-19 03:37:11,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=548060.0, ans=0.0 2023-11-19 03:37:24,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=548193.3333333334, ans=0.125 2023-11-19 03:37:37,700 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10100, loss[loss=0.08739, simple_loss=0.1039, pruned_loss=0.02548, audio_tagging_loss=0.009949, over 15725.00 frames. ], tot_loss[loss=0.0935, simple_loss=0.1112, pruned_loss=0.02711, audio_tagging_loss=0.01077, over 3054755.19 frames. ], batch size: 60, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:37:42,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=548260.0, ans=0.125 2023-11-19 03:38:18,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.585e+01 9.588e+01 1.090e+02 1.708e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-19 03:38:19,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=548460.0, ans=0.2 2023-11-19 03:38:20,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=548460.0, ans=0.0 2023-11-19 03:38:23,267 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:38:32,757 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10150, loss[loss=0.1023, simple_loss=0.1293, pruned_loss=0.02811, audio_tagging_loss=0.009533, over 15391.00 frames. ], tot_loss[loss=0.09339, simple_loss=0.1111, pruned_loss=0.02705, audio_tagging_loss=0.0108, over 3054701.86 frames. ], batch size: 58, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:38:37,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548593.3333333334, ans=0.1 2023-11-19 03:38:41,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=548593.3333333334, ans=0.2 2023-11-19 03:38:46,679 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=548660.0, ans=0.04949747468305833 2023-11-19 03:38:50,103 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2023-11-19 03:38:50,973 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:38:50,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=548660.0, ans=0.125 2023-11-19 03:38:59,163 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:39:18,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=548860.0, ans=10.0 2023-11-19 03:39:25,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=548860.0, ans=0.1 2023-11-19 03:39:27,537 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10200, loss[loss=0.113, simple_loss=0.1274, pruned_loss=0.03971, audio_tagging_loss=0.009551, over 15406.00 frames. ], tot_loss[loss=0.09387, simple_loss=0.1116, pruned_loss=0.0272, audio_tagging_loss=0.01085, over 3056347.69 frames. ], batch size: 57, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:39:29,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=548926.6666666666, ans=0.125 2023-11-19 03:39:34,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=548926.6666666666, ans=0.125 2023-11-19 03:39:49,370 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:39:57,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=549060.0, ans=0.2 2023-11-19 03:40:07,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=22.5 2023-11-19 03:40:08,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.839e+01 9.897e+01 1.124e+02 1.590e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-19 03:40:13,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=549193.3333333334, ans=0.035 2023-11-19 03:40:16,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=549193.3333333334, ans=0.1 2023-11-19 03:40:23,221 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10250, loss[loss=0.07872, simple_loss=0.09754, pruned_loss=0.01914, audio_tagging_loss=0.01081, over 14130.00 frames. ], tot_loss[loss=0.09277, simple_loss=0.1104, pruned_loss=0.02663, audio_tagging_loss=0.01096, over 3047163.26 frames. ], batch size: 53, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:40:30,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=549260.0, ans=0.0 2023-11-19 03:40:42,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549326.6666666666, ans=0.1 2023-11-19 03:40:50,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=549393.3333333334, ans=0.0 2023-11-19 03:40:51,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=549393.3333333334, ans=0.0 2023-11-19 03:40:55,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549460.0, ans=0.1 2023-11-19 03:41:03,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549460.0, ans=0.1 2023-11-19 03:41:19,416 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10300, loss[loss=0.07184, simple_loss=0.08082, pruned_loss=0.01864, audio_tagging_loss=0.01279, over 15588.00 frames. ], tot_loss[loss=0.09263, simple_loss=0.1105, pruned_loss=0.02653, audio_tagging_loss=0.01085, over 3049286.14 frames. ], batch size: 61, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:41:25,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=549593.3333333334, ans=0.035 2023-11-19 03:41:33,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=549660.0, ans=0.2 2023-11-19 03:42:00,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.478e+01 9.203e+01 9.958e+01 1.173e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 03:42:10,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=549860.0, ans=0.125 2023-11-19 03:42:11,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=549860.0, ans=0.125 2023-11-19 03:42:13,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=549926.6666666666, ans=0.125 2023-11-19 03:42:14,016 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10350, loss[loss=0.0972, simple_loss=0.1189, pruned_loss=0.02875, audio_tagging_loss=0.008977, over 15037.00 frames. ], tot_loss[loss=0.09299, simple_loss=0.111, pruned_loss=0.02659, audio_tagging_loss=0.01089, over 3052248.83 frames. ], batch size: 54, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:42:15,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.82 vs. limit=22.5 2023-11-19 03:42:52,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=550126.6666666666, ans=0.95 2023-11-19 03:42:58,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=550193.3333333334, ans=0.07 2023-11-19 03:43:05,357 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=550193.3333333334, ans=0.2 2023-11-19 03:43:08,765 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10400, loss[loss=0.08125, simple_loss=0.1017, pruned_loss=0.01955, audio_tagging_loss=0.01084, over 14512.00 frames. ], tot_loss[loss=0.09307, simple_loss=0.1109, pruned_loss=0.02671, audio_tagging_loss=0.01094, over 3049484.51 frames. ], batch size: 56, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:43:28,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=550326.6666666666, ans=0.125 2023-11-19 03:43:36,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=550393.3333333334, ans=0.0 2023-11-19 03:43:51,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.573e+01 9.410e+01 1.023e+02 1.490e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 03:43:51,333 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=550460.0, ans=0.2 2023-11-19 03:44:02,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=550526.6666666666, ans=0.5 2023-11-19 03:44:04,839 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10450, loss[loss=0.1299, simple_loss=0.1714, pruned_loss=0.03946, audio_tagging_loss=0.00475, over 15938.00 frames. ], tot_loss[loss=0.09328, simple_loss=0.1113, pruned_loss=0.02677, audio_tagging_loss=0.01088, over 3046104.59 frames. ], batch size: 55, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:44:40,807 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.03 vs. limit=12.0 2023-11-19 03:44:57,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=550860.0, ans=0.125 2023-11-19 03:44:59,701 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10500, loss[loss=0.1006, simple_loss=0.1245, pruned_loss=0.02945, audio_tagging_loss=0.008895, over 14663.00 frames. ], tot_loss[loss=0.09262, simple_loss=0.1102, pruned_loss=0.02666, audio_tagging_loss=0.01084, over 3044025.87 frames. ], batch size: 53, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:45:01,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=550926.6666666666, ans=0.0 2023-11-19 03:45:14,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=550993.3333333334, ans=0.125 2023-11-19 03:45:14,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=550993.3333333334, ans=0.0 2023-11-19 03:45:41,934 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.439e+01 9.051e+01 1.036e+02 1.339e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 03:45:49,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=551193.3333333334, ans=0.2 2023-11-19 03:45:55,184 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10550, loss[loss=0.108, simple_loss=0.1372, pruned_loss=0.0331, audio_tagging_loss=0.006357, over 14840.00 frames. ], tot_loss[loss=0.09256, simple_loss=0.1107, pruned_loss=0.02662, audio_tagging_loss=0.01061, over 3043017.23 frames. ], batch size: 57, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:45:55,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=551260.0, ans=0.125 2023-11-19 03:46:02,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=551260.0, ans=0.0 2023-11-19 03:46:36,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=551460.0, ans=0.125 2023-11-19 03:46:42,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=12.0 2023-11-19 03:46:51,179 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10600, loss[loss=0.1038, simple_loss=0.1288, pruned_loss=0.02999, audio_tagging_loss=0.009459, over 16083.00 frames. ], tot_loss[loss=0.09253, simple_loss=0.1105, pruned_loss=0.02665, audio_tagging_loss=0.01062, over 3050581.56 frames. ], batch size: 57, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:46:59,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2023-11-19 03:47:33,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.037e+01 8.515e+01 9.253e+01 1.023e+02 1.317e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 03:47:34,050 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=551793.3333333334, ans=0.0 2023-11-19 03:47:47,251 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10650, loss[loss=0.09232, simple_loss=0.1028, pruned_loss=0.02707, audio_tagging_loss=0.01386, over 15502.00 frames. ], tot_loss[loss=0.09343, simple_loss=0.1115, pruned_loss=0.02702, audio_tagging_loss=0.01065, over 3042936.54 frames. ], batch size: 59, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:47:59,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=551993.3333333334, ans=0.125 2023-11-19 03:47:59,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=551993.3333333334, ans=0.0 2023-11-19 03:48:10,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=552060.0, ans=0.125 2023-11-19 03:48:36,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=552193.3333333334, ans=0.04949747468305833 2023-11-19 03:48:43,085 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10700, loss[loss=0.09219, simple_loss=0.1162, pruned_loss=0.02289, audio_tagging_loss=0.0112, over 14016.00 frames. ], tot_loss[loss=0.09337, simple_loss=0.1115, pruned_loss=0.02694, audio_tagging_loss=0.01066, over 3043449.05 frames. ], batch size: 53, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:49:01,953 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-19 03:49:25,174 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.579e+01 9.318e+01 1.032e+02 2.166e+02, threshold=1.864e+02, percent-clipped=1.0 2023-11-19 03:49:39,633 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10750, loss[loss=0.09592, simple_loss=0.124, pruned_loss=0.02645, audio_tagging_loss=0.007472, over 15310.00 frames. ], tot_loss[loss=0.09344, simple_loss=0.1116, pruned_loss=0.02698, audio_tagging_loss=0.01067, over 3043370.28 frames. ], batch size: 55, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:49:40,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=552593.3333333334, ans=0.125 2023-11-19 03:49:41,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=552593.3333333334, ans=0.125 2023-11-19 03:49:46,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=552593.3333333334, ans=0.125 2023-11-19 03:49:51,809 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-19 03:50:00,935 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.24 vs. limit=22.5 2023-11-19 03:50:13,585 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-19 03:50:31,661 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=552860.0, ans=0.0 2023-11-19 03:50:34,595 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10800, loss[loss=0.08688, simple_loss=0.09907, pruned_loss=0.0272, audio_tagging_loss=0.01014, over 14757.00 frames. ], tot_loss[loss=0.09299, simple_loss=0.111, pruned_loss=0.02676, audio_tagging_loss=0.01073, over 3047459.54 frames. ], batch size: 57, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:50:39,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=552926.6666666666, ans=0.2 2023-11-19 03:50:44,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=552993.3333333334, ans=0.0 2023-11-19 03:50:56,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=553060.0, ans=0.07 2023-11-19 03:51:01,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.50 vs. limit=22.5 2023-11-19 03:51:04,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.74 vs. limit=10.0 2023-11-19 03:51:16,709 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.594e+01 9.337e+01 1.055e+02 1.336e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 03:51:19,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-19 03:51:30,084 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10850, loss[loss=0.06571, simple_loss=0.0713, pruned_loss=0.01759, audio_tagging_loss=0.01247, over 16465.00 frames. ], tot_loss[loss=0.09258, simple_loss=0.1104, pruned_loss=0.0266, audio_tagging_loss=0.01078, over 3044276.85 frames. ], batch size: 64, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:51:40,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=553260.0, ans=0.2 2023-11-19 03:52:07,137 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=22.5 2023-11-19 03:52:24,132 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:52:27,320 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10900, loss[loss=0.0922, simple_loss=0.1205, pruned_loss=0.02505, audio_tagging_loss=0.006924, over 16414.00 frames. ], tot_loss[loss=0.09322, simple_loss=0.111, pruned_loss=0.02689, audio_tagging_loss=0.01085, over 3045126.25 frames. ], batch size: 60, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:52:28,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=553593.3333333334, ans=0.125 2023-11-19 03:52:29,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=553593.3333333334, ans=0.125 2023-11-19 03:52:39,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=553660.0, ans=0.125 2023-11-19 03:52:59,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=553793.3333333334, ans=0.0 2023-11-19 03:53:02,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553793.3333333334, ans=0.1 2023-11-19 03:53:09,146 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2023-11-19 03:53:09,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.290e+01 8.895e+01 9.757e+01 1.197e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 03:53:22,210 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 10950, loss[loss=0.0928, simple_loss=0.1181, pruned_loss=0.02412, audio_tagging_loss=0.00962, over 15070.00 frames. ], tot_loss[loss=0.09342, simple_loss=0.1112, pruned_loss=0.02698, audio_tagging_loss=0.01083, over 3045867.45 frames. ], batch size: 55, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:53:22,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=553926.6666666666, ans=0.0 2023-11-19 03:53:23,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=553926.6666666666, ans=0.125 2023-11-19 03:53:25,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=553926.6666666666, ans=0.025 2023-11-19 03:53:26,563 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:53:46,377 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=554060.0, ans=0.125 2023-11-19 03:54:00,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=554126.6666666666, ans=0.125 2023-11-19 03:54:05,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=554193.3333333334, ans=0.125 2023-11-19 03:54:15,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=554193.3333333334, ans=0.2 2023-11-19 03:54:17,503 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11000, loss[loss=0.1217, simple_loss=0.1586, pruned_loss=0.03425, audio_tagging_loss=0.008105, over 15104.00 frames. ], tot_loss[loss=0.0933, simple_loss=0.1109, pruned_loss=0.02694, audio_tagging_loss=0.0109, over 3046225.07 frames. ], batch size: 56, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:54:26,555 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:54:29,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-19 03:54:30,554 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=554326.6666666666, ans=0.125 2023-11-19 03:54:45,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=554393.3333333334, ans=0.95 2023-11-19 03:54:51,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=554460.0, ans=0.125 2023-11-19 03:54:58,748 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 8.672e+01 9.432e+01 1.068e+02 1.333e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 03:55:00,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=554526.6666666666, ans=0.125 2023-11-19 03:55:02,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=554526.6666666666, ans=0.0 2023-11-19 03:55:13,612 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11050, loss[loss=0.08977, simple_loss=0.1087, pruned_loss=0.02511, audio_tagging_loss=0.01031, over 14589.00 frames. ], tot_loss[loss=0.09312, simple_loss=0.1107, pruned_loss=0.02687, audio_tagging_loss=0.01088, over 3048113.62 frames. ], batch size: 56, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:55:15,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=554593.3333333334, ans=15.0 2023-11-19 03:55:15,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=554593.3333333334, ans=0.0 2023-11-19 03:55:23,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554660.0, ans=0.1 2023-11-19 03:55:24,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=554660.0, ans=0.125 2023-11-19 03:55:25,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=554660.0, ans=0.125 2023-11-19 03:55:34,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=554726.6666666666, ans=0.0 2023-11-19 03:55:49,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=12.0 2023-11-19 03:56:04,940 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=554860.0, ans=0.1 2023-11-19 03:56:08,862 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11100, loss[loss=0.08883, simple_loss=0.1084, pruned_loss=0.02465, audio_tagging_loss=0.009995, over 15996.00 frames. ], tot_loss[loss=0.094, simple_loss=0.1118, pruned_loss=0.02716, audio_tagging_loss=0.01096, over 3049912.48 frames. ], batch size: 59, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:56:14,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=554926.6666666666, ans=0.2 2023-11-19 03:56:17,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=554926.6666666666, ans=0.125 2023-11-19 03:56:23,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=554993.3333333334, ans=0.125 2023-11-19 03:56:42,028 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.09 vs. limit=15.0 2023-11-19 03:56:46,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555126.6666666666, ans=0.1 2023-11-19 03:56:48,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=555126.6666666666, ans=0.125 2023-11-19 03:56:51,124 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.615e+01 9.620e+01 1.023e+02 1.432e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 03:57:02,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=555260.0, ans=0.125 2023-11-19 03:57:03,795 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11150, loss[loss=0.1237, simple_loss=0.1395, pruned_loss=0.04319, audio_tagging_loss=0.01072, over 16063.00 frames. ], tot_loss[loss=0.09426, simple_loss=0.1117, pruned_loss=0.02727, audio_tagging_loss=0.01112, over 3056302.29 frames. ], batch size: 59, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:57:08,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555260.0, ans=0.1 2023-11-19 03:57:27,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=555393.3333333334, ans=0.125 2023-11-19 03:57:34,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=555393.3333333334, ans=0.125 2023-11-19 03:57:52,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-19 03:57:59,492 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11200, loss[loss=0.08151, simple_loss=0.1006, pruned_loss=0.01951, audio_tagging_loss=0.01169, over 15404.00 frames. ], tot_loss[loss=0.09417, simple_loss=0.1118, pruned_loss=0.02715, audio_tagging_loss=0.01113, over 3057273.15 frames. ], batch size: 58, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:58:00,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.03 vs. limit=10.0 2023-11-19 03:58:04,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=555593.3333333334, ans=0.0 2023-11-19 03:58:17,241 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.13 vs. limit=12.0 2023-11-19 03:58:20,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=555726.6666666666, ans=15.0 2023-11-19 03:58:21,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-19 03:58:22,375 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.672e-01 2023-11-19 03:58:22,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=555726.6666666666, ans=0.0 2023-11-19 03:58:27,659 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=555726.6666666666, ans=0.125 2023-11-19 03:58:41,678 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.552e+01 9.021e+01 1.004e+02 1.285e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 03:58:42,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=555793.3333333334, ans=0.125 2023-11-19 03:58:43,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=555860.0, ans=0.125 2023-11-19 03:58:49,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=555860.0, ans=0.125 2023-11-19 03:58:55,028 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11250, loss[loss=0.08143, simple_loss=0.08918, pruned_loss=0.02119, audio_tagging_loss=0.01566, over 14235.00 frames. ], tot_loss[loss=0.09301, simple_loss=0.1101, pruned_loss=0.02685, audio_tagging_loss=0.01111, over 3049751.35 frames. ], batch size: 54, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:58:56,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=555926.6666666666, ans=0.2 2023-11-19 03:59:04,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=555993.3333333334, ans=0.125 2023-11-19 03:59:05,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.80 vs. limit=6.0 2023-11-19 03:59:12,799 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-19 03:59:15,954 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-11-19 03:59:16,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=556060.0, ans=0.0 2023-11-19 03:59:35,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=556126.6666666666, ans=0.125 2023-11-19 03:59:35,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=556126.6666666666, ans=0.125 2023-11-19 03:59:43,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=556193.3333333334, ans=0.2 2023-11-19 03:59:46,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=556193.3333333334, ans=0.1 2023-11-19 03:59:50,343 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11300, loss[loss=0.08972, simple_loss=0.1108, pruned_loss=0.02583, audio_tagging_loss=0.00847, over 14130.00 frames. ], tot_loss[loss=0.0917, simple_loss=0.1088, pruned_loss=0.02628, audio_tagging_loss=0.01099, over 3050051.84 frames. ], batch size: 54, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:59:50,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556260.0, ans=0.1 2023-11-19 03:59:51,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=556260.0, ans=0.125 2023-11-19 04:00:00,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=556326.6666666666, ans=0.2 2023-11-19 04:00:14,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=556393.3333333334, ans=0.125 2023-11-19 04:00:24,335 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2023-11-19 04:00:26,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=556460.0, ans=0.1 2023-11-19 04:00:28,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2023-11-19 04:00:32,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.772e+01 9.510e+01 1.073e+02 1.316e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 04:00:34,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=556526.6666666666, ans=0.125 2023-11-19 04:00:42,729 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.48 vs. limit=10.0 2023-11-19 04:00:44,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=556526.6666666666, ans=0.125 2023-11-19 04:00:46,037 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11350, loss[loss=0.08148, simple_loss=0.1074, pruned_loss=0.02151, audio_tagging_loss=0.006274, over 15355.00 frames. ], tot_loss[loss=0.09264, simple_loss=0.1102, pruned_loss=0.02679, audio_tagging_loss=0.01075, over 3047804.96 frames. ], batch size: 57, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:00:55,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=556593.3333333334, ans=0.125 2023-11-19 04:00:57,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=556660.0, ans=0.125 2023-11-19 04:01:26,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=556793.3333333334, ans=0.2 2023-11-19 04:01:38,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=556860.0, ans=0.125 2023-11-19 04:01:41,027 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.60 vs. limit=22.5 2023-11-19 04:01:41,529 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11400, loss[loss=0.08298, simple_loss=0.1045, pruned_loss=0.02163, audio_tagging_loss=0.00908, over 15118.00 frames. ], tot_loss[loss=0.0925, simple_loss=0.1101, pruned_loss=0.02675, audio_tagging_loss=0.01071, over 3048250.21 frames. ], batch size: 59, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:01:54,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=556993.3333333334, ans=0.1 2023-11-19 04:02:23,531 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.302e+01 8.621e+01 9.378e+01 1.036e+02 2.217e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-19 04:02:27,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=557193.3333333334, ans=0.0 2023-11-19 04:02:30,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=557193.3333333334, ans=0.0 2023-11-19 04:02:36,334 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11450, loss[loss=0.09717, simple_loss=0.1186, pruned_loss=0.0286, audio_tagging_loss=0.00929, over 15702.00 frames. ], tot_loss[loss=0.09189, simple_loss=0.1094, pruned_loss=0.02644, audio_tagging_loss=0.01076, over 3048943.89 frames. ], batch size: 57, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:02:37,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=557260.0, ans=0.0 2023-11-19 04:02:45,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=15.0 2023-11-19 04:02:53,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=557326.6666666666, ans=0.09899494936611666 2023-11-19 04:02:58,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2023-11-19 04:03:25,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=557526.6666666666, ans=0.04949747468305833 2023-11-19 04:03:32,397 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11500, loss[loss=0.1249, simple_loss=0.1554, pruned_loss=0.03902, audio_tagging_loss=0.008191, over 16402.00 frames. ], tot_loss[loss=0.09312, simple_loss=0.1109, pruned_loss=0.02695, audio_tagging_loss=0.01073, over 3051634.26 frames. ], batch size: 56, lr: 9.55e-03, grad_scale: 16.0 2023-11-19 04:03:38,588 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=557593.3333333334, ans=0.0 2023-11-19 04:04:07,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=557793.3333333334, ans=0.1 2023-11-19 04:04:07,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=557793.3333333334, ans=0.0 2023-11-19 04:04:09,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-19 04:04:10,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=557793.3333333334, ans=0.0 2023-11-19 04:04:15,299 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2023-11-19 04:04:15,903 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.787e+01 9.889e+01 1.125e+02 1.791e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-19 04:04:16,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=557860.0, ans=0.125 2023-11-19 04:04:28,859 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11550, loss[loss=0.0836, simple_loss=0.09413, pruned_loss=0.02255, audio_tagging_loss=0.01398, over 15629.00 frames. ], tot_loss[loss=0.09333, simple_loss=0.1111, pruned_loss=0.02703, audio_tagging_loss=0.01074, over 3057084.46 frames. ], batch size: 59, lr: 9.54e-03, grad_scale: 16.0 2023-11-19 04:04:37,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=557926.6666666666, ans=0.125 2023-11-19 04:04:45,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=557993.3333333334, ans=0.1 2023-11-19 04:04:49,374 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=12.0 2023-11-19 04:04:49,545 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2023-11-19 04:05:01,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=558126.6666666666, ans=0.2 2023-11-19 04:05:02,029 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:05:16,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=558193.3333333334, ans=0.125 2023-11-19 04:05:20,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=558193.3333333334, ans=0.5 2023-11-19 04:05:23,642 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11600, loss[loss=0.08536, simple_loss=0.1018, pruned_loss=0.02191, audio_tagging_loss=0.01254, over 15493.00 frames. ], tot_loss[loss=0.09348, simple_loss=0.1114, pruned_loss=0.02706, audio_tagging_loss=0.01072, over 3062094.29 frames. ], batch size: 58, lr: 9.54e-03, grad_scale: 32.0 2023-11-19 04:05:26,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=558260.0, ans=0.02 2023-11-19 04:05:34,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=558326.6666666666, ans=0.0 2023-11-19 04:05:35,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=12.0 2023-11-19 04:05:43,529 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:05:53,083 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.946e-01 2023-11-19 04:05:58,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2023-11-19 04:06:00,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2023-11-19 04:06:05,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=558460.0, ans=0.0 2023-11-19 04:06:07,037 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.676e+01 9.320e+01 1.048e+02 1.345e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-19 04:06:07,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=558526.6666666666, ans=0.125 2023-11-19 04:06:16,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=558526.6666666666, ans=0.2 2023-11-19 04:06:18,601 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11650, loss[loss=0.09258, simple_loss=0.09989, pruned_loss=0.02973, audio_tagging_loss=0.0129, over 15138.00 frames. ], tot_loss[loss=0.0936, simple_loss=0.1116, pruned_loss=0.02706, audio_tagging_loss=0.01072, over 3058038.72 frames. ], batch size: 59, lr: 9.54e-03, grad_scale: 32.0 2023-11-19 04:06:27,368 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:06:48,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=558726.6666666666, ans=0.0 2023-11-19 04:07:14,578 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11700, loss[loss=0.08022, simple_loss=0.09448, pruned_loss=0.02278, audio_tagging_loss=0.0102, over 15444.00 frames. ], tot_loss[loss=0.09357, simple_loss=0.1114, pruned_loss=0.02702, audio_tagging_loss=0.01083, over 3058116.04 frames. ], batch size: 60, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:07:18,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=558926.6666666666, ans=0.125 2023-11-19 04:07:41,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=559060.0, ans=0.2 2023-11-19 04:07:50,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=559126.6666666666, ans=0.125 2023-11-19 04:07:52,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=559126.6666666666, ans=0.0 2023-11-19 04:07:57,442 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.689e+01 9.449e+01 1.084e+02 2.126e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-19 04:08:09,616 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11750, loss[loss=0.1128, simple_loss=0.1431, pruned_loss=0.03198, audio_tagging_loss=0.009231, over 15412.00 frames. ], tot_loss[loss=0.09283, simple_loss=0.1103, pruned_loss=0.0268, audio_tagging_loss=0.01088, over 3054676.91 frames. ], batch size: 56, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:08:33,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=559393.3333333334, ans=0.95 2023-11-19 04:08:35,576 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=12.0 2023-11-19 04:08:38,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=559393.3333333334, ans=0.0 2023-11-19 04:08:46,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=559460.0, ans=0.2 2023-11-19 04:08:54,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=559526.6666666666, ans=0.0 2023-11-19 04:08:55,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=559526.6666666666, ans=0.125 2023-11-19 04:09:03,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2023-11-19 04:09:03,943 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11800, loss[loss=0.08672, simple_loss=0.1135, pruned_loss=0.02097, audio_tagging_loss=0.009013, over 15882.00 frames. ], tot_loss[loss=0.09243, simple_loss=0.1096, pruned_loss=0.02673, audio_tagging_loss=0.01088, over 3060454.76 frames. ], batch size: 57, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:09:14,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=559660.0, ans=0.0 2023-11-19 04:09:14,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=559660.0, ans=0.125 2023-11-19 04:09:15,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=559660.0, ans=0.125 2023-11-19 04:09:19,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=559660.0, ans=0.125 2023-11-19 04:09:30,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2023-11-19 04:09:36,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=559793.3333333334, ans=0.125 2023-11-19 04:09:38,621 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2023-11-19 04:09:46,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.768e+01 9.397e+01 1.015e+02 1.463e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 04:09:59,775 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11850, loss[loss=0.09787, simple_loss=0.1205, pruned_loss=0.02738, audio_tagging_loss=0.01025, over 15471.00 frames. ], tot_loss[loss=0.09264, simple_loss=0.1099, pruned_loss=0.02677, audio_tagging_loss=0.01094, over 3061006.94 frames. ], batch size: 58, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:10:27,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.34 vs. limit=5.0 2023-11-19 04:10:52,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=560193.3333333334, ans=0.125 2023-11-19 04:10:52,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560193.3333333334, ans=0.1 2023-11-19 04:10:56,859 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11900, loss[loss=0.1016, simple_loss=0.1241, pruned_loss=0.02849, audio_tagging_loss=0.01103, over 15165.00 frames. ], tot_loss[loss=0.0929, simple_loss=0.1104, pruned_loss=0.02668, audio_tagging_loss=0.01104, over 3059099.28 frames. ], batch size: 57, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:11:23,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=560393.3333333334, ans=0.09899494936611666 2023-11-19 04:11:31,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=560460.0, ans=0.2 2023-11-19 04:11:35,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=560460.0, ans=0.125 2023-11-19 04:11:35,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=560460.0, ans=0.2 2023-11-19 04:11:36,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=560460.0, ans=0.125 2023-11-19 04:11:40,153 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.784e+01 9.525e+01 1.032e+02 1.390e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 04:11:40,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=560526.6666666666, ans=0.125 2023-11-19 04:11:41,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=560526.6666666666, ans=0.1 2023-11-19 04:11:44,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=560526.6666666666, ans=0.125 2023-11-19 04:11:51,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2023-11-19 04:11:52,354 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 11950, loss[loss=0.07216, simple_loss=0.08238, pruned_loss=0.01558, audio_tagging_loss=0.01538, over 15268.00 frames. ], tot_loss[loss=0.09176, simple_loss=0.1088, pruned_loss=0.02622, audio_tagging_loss=0.01115, over 3059394.13 frames. ], batch size: 58, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:12:05,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2023-11-19 04:12:11,317 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-11-19 04:12:26,424 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=560793.3333333334, ans=0.0 2023-11-19 04:12:46,240 INFO [train_asr.py:1115] (3/4) Epoch 7, batch 12000, loss[loss=0.09336, simple_loss=0.1202, pruned_loss=0.02414, audio_tagging_loss=0.009106, over 15186.00 frames. ], tot_loss[loss=0.09157, simple_loss=0.1086, pruned_loss=0.02609, audio_tagging_loss=0.01118, over 3058595.89 frames. ], batch size: 54, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:12:46,241 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 04:13:13,605 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1963, 2.3602, 5.1614, 2.2704], device='cuda:3') 2023-11-19 04:13:19,245 INFO [train_asr.py:1147] (3/4) Epoch 7, validation: loss=0.0682, simple_loss=0.05751, pruned_loss=0.007422, audio_tagging_loss=0.03202, over 4681554.00 frames. 2023-11-19 04:13:19,246 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 04:13:20,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=560926.6666666666, ans=0.0 2023-11-19 04:13:29,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=560993.3333333334, ans=0.1 2023-11-19 04:13:34,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=560993.3333333334, ans=0.125 2023-11-19 04:13:34,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.21 vs. limit=15.0 2023-11-19 04:13:39,643 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=561060.0, ans=0.125 2023-11-19 04:14:20,117 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 0, loss[loss=0.1189, simple_loss=0.1236, pruned_loss=0.03421, audio_tagging_loss=0.02291, over 15943.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1236, pruned_loss=0.03421, audio_tagging_loss=0.02291, over 15943.00 frames. ], batch size: 58, lr: 8.97e-03, grad_scale: 32.0 2023-11-19 04:14:20,118 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 04:14:43,368 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.6034, 4.1012, 3.7157, 2.8029], device='cuda:3') 2023-11-19 04:14:51,778 INFO [train_asr.py:1147] (3/4) Epoch 8, validation: loss=0.06722, simple_loss=0.05736, pruned_loss=0.007334, audio_tagging_loss=0.0312, over 4681554.00 frames. 2023-11-19 04:14:51,779 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 04:14:57,560 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-11-19 04:15:10,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.365e+01 1.076e+02 1.160e+02 2.715e+02, threshold=2.151e+02, percent-clipped=1.0 2023-11-19 04:15:32,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=561280.0, ans=0.0 2023-11-19 04:15:40,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.43 vs. limit=15.0 2023-11-19 04:15:47,581 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 50, loss[loss=0.07761, simple_loss=0.08031, pruned_loss=0.01891, audio_tagging_loss=0.01854, over 14032.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1091, pruned_loss=0.02624, audio_tagging_loss=0.02043, over 686470.25 frames. ], batch size: 56, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:16:11,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561546.6666666666, ans=0.1 2023-11-19 04:16:14,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561546.6666666666, ans=0.1 2023-11-19 04:16:33,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.24 vs. limit=12.0 2023-11-19 04:16:38,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-11-19 04:16:43,676 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 100, loss[loss=0.1099, simple_loss=0.1286, pruned_loss=0.03183, audio_tagging_loss=0.01382, over 15565.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1087, pruned_loss=0.02658, audio_tagging_loss=0.01994, over 1203274.48 frames. ], batch size: 61, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:16:53,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=561813.3333333334, ans=0.125 2023-11-19 04:17:02,330 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 9.025e+01 9.629e+01 1.101e+02 1.552e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-19 04:17:07,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=561880.0, ans=0.125 2023-11-19 04:17:16,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=561946.6666666666, ans=0.125 2023-11-19 04:17:22,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=561946.6666666666, ans=0.0 2023-11-19 04:17:22,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=15.0 2023-11-19 04:17:23,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=561946.6666666666, ans=0.0 2023-11-19 04:17:27,705 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:17:28,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-19 04:17:33,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=562013.3333333334, ans=0.015 2023-11-19 04:17:39,011 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 150, loss[loss=0.09594, simple_loss=0.1126, pruned_loss=0.02746, audio_tagging_loss=0.01216, over 15727.00 frames. ], tot_loss[loss=0.09747, simple_loss=0.1076, pruned_loss=0.02565, audio_tagging_loss=0.018, over 1610672.02 frames. ], batch size: 57, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:17:41,619 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=15.0 2023-11-19 04:18:14,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=562280.0, ans=0.125 2023-11-19 04:18:14,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=562280.0, ans=0.2 2023-11-19 04:18:31,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=562346.6666666666, ans=0.0 2023-11-19 04:18:35,266 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 200, loss[loss=0.1026, simple_loss=0.122, pruned_loss=0.03175, audio_tagging_loss=0.009878, over 15439.00 frames. ], tot_loss[loss=0.09561, simple_loss=0.1079, pruned_loss=0.02589, audio_tagging_loss=0.01575, over 1930186.56 frames. ], batch size: 56, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:18:36,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=562413.3333333334, ans=0.2 2023-11-19 04:18:37,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=562413.3333333334, ans=0.07 2023-11-19 04:18:42,201 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=562413.3333333334, ans=0.125 2023-11-19 04:18:48,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562480.0, ans=0.125 2023-11-19 04:18:51,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=562480.0, ans=0.0 2023-11-19 04:18:54,363 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.627e+01 9.281e+01 9.933e+01 1.355e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-19 04:19:05,310 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:19:28,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=562680.0, ans=0.125 2023-11-19 04:19:31,614 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 250, loss[loss=0.09868, simple_loss=0.1201, pruned_loss=0.02996, audio_tagging_loss=0.008664, over 15159.00 frames. ], tot_loss[loss=0.09388, simple_loss=0.1081, pruned_loss=0.02556, audio_tagging_loss=0.01425, over 2187178.63 frames. ], batch size: 55, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:19:44,513 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:19:50,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=562813.3333333334, ans=0.125 2023-11-19 04:20:01,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=562880.0, ans=0.2 2023-11-19 04:20:16,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563013.3333333334, ans=0.1 2023-11-19 04:20:26,689 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 300, loss[loss=0.06775, simple_loss=0.08274, pruned_loss=0.01422, audio_tagging_loss=0.01216, over 15672.00 frames. ], tot_loss[loss=0.09317, simple_loss=0.1084, pruned_loss=0.02572, audio_tagging_loss=0.01323, over 2384778.88 frames. ], batch size: 57, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:20:43,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=563146.6666666666, ans=0.2 2023-11-19 04:20:45,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.672e+01 9.179e+01 1.018e+02 1.268e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 04:20:49,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=563213.3333333334, ans=0.125 2023-11-19 04:21:01,265 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-19 04:21:22,021 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 350, loss[loss=0.08078, simple_loss=0.09001, pruned_loss=0.02504, audio_tagging_loss=0.01073, over 16879.00 frames. ], tot_loss[loss=0.09307, simple_loss=0.1094, pruned_loss=0.02598, audio_tagging_loss=0.01237, over 2533408.73 frames. ], batch size: 65, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:21:30,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=563413.3333333334, ans=0.125 2023-11-19 04:21:45,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563546.6666666666, ans=0.1 2023-11-19 04:21:51,775 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2023-11-19 04:22:00,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=563613.3333333334, ans=0.125 2023-11-19 04:22:00,903 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=563613.3333333334, ans=0.0 2023-11-19 04:22:08,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=563680.0, ans=0.0 2023-11-19 04:22:16,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=563680.0, ans=0.125 2023-11-19 04:22:18,673 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 400, loss[loss=0.07785, simple_loss=0.09462, pruned_loss=0.01855, audio_tagging_loss=0.012, over 16099.00 frames. ], tot_loss[loss=0.09279, simple_loss=0.1096, pruned_loss=0.02602, audio_tagging_loss=0.01196, over 2646738.97 frames. ], batch size: 61, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:22:19,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=563746.6666666666, ans=0.0 2023-11-19 04:22:23,432 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2023-11-19 04:22:33,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=563813.3333333334, ans=0.125 2023-11-19 04:22:36,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.502e+01 9.440e+01 1.057e+02 1.683e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 04:22:41,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=563880.0, ans=0.0 2023-11-19 04:22:41,423 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-19 04:23:04,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=564013.3333333334, ans=0.125 2023-11-19 04:23:11,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=564013.3333333334, ans=0.2 2023-11-19 04:23:12,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=564080.0, ans=0.2 2023-11-19 04:23:13,480 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 450, loss[loss=0.08919, simple_loss=0.1106, pruned_loss=0.02426, audio_tagging_loss=0.009655, over 15520.00 frames. ], tot_loss[loss=0.09214, simple_loss=0.1093, pruned_loss=0.02585, audio_tagging_loss=0.01166, over 2732442.60 frames. ], batch size: 59, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:23:19,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=564080.0, ans=0.1 2023-11-19 04:23:26,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=564146.6666666666, ans=0.0 2023-11-19 04:23:47,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=564280.0, ans=0.125 2023-11-19 04:23:54,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=564280.0, ans=0.1 2023-11-19 04:24:08,562 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 500, loss[loss=0.1053, simple_loss=0.1154, pruned_loss=0.03576, audio_tagging_loss=0.01184, over 15048.00 frames. ], tot_loss[loss=0.09309, simple_loss=0.1105, pruned_loss=0.02637, audio_tagging_loss=0.01146, over 2810006.48 frames. ], batch size: 57, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:24:11,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=564413.3333333334, ans=0.125 2023-11-19 04:24:13,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=564413.3333333334, ans=0.125 2023-11-19 04:24:14,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564413.3333333334, ans=0.1 2023-11-19 04:24:28,809 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.601e+01 9.237e+01 1.002e+02 1.241e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 04:25:02,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564680.0, ans=0.1 2023-11-19 04:25:04,842 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 550, loss[loss=0.06673, simple_loss=0.07447, pruned_loss=0.01755, audio_tagging_loss=0.01194, over 14950.00 frames. ], tot_loss[loss=0.09225, simple_loss=0.1094, pruned_loss=0.02618, audio_tagging_loss=0.01138, over 2863207.50 frames. ], batch size: 58, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:25:17,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-19 04:25:23,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.32 vs. limit=10.0 2023-11-19 04:25:54,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=565013.3333333334, ans=0.0 2023-11-19 04:26:00,650 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 600, loss[loss=0.07831, simple_loss=0.09747, pruned_loss=0.0183, audio_tagging_loss=0.01128, over 15519.00 frames. ], tot_loss[loss=0.09304, simple_loss=0.1104, pruned_loss=0.02644, audio_tagging_loss=0.0114, over 2905365.82 frames. ], batch size: 58, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:26:18,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.437e+01 9.383e+01 9.998e+01 1.583e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-19 04:26:45,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=565346.6666666666, ans=0.125 2023-11-19 04:26:55,373 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.98 vs. limit=12.0 2023-11-19 04:26:56,077 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 650, loss[loss=0.08288, simple_loss=0.1068, pruned_loss=0.02027, audio_tagging_loss=0.009181, over 15919.00 frames. ], tot_loss[loss=0.09268, simple_loss=0.1102, pruned_loss=0.02628, audio_tagging_loss=0.01131, over 2934228.78 frames. ], batch size: 58, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:27:22,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=565546.6666666666, ans=0.5 2023-11-19 04:27:40,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=565680.0, ans=0.125 2023-11-19 04:27:41,940 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.288e-01 2023-11-19 04:27:46,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-11-19 04:27:47,649 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=565680.0, ans=0.125 2023-11-19 04:27:51,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=565746.6666666666, ans=0.125 2023-11-19 04:27:52,225 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 700, loss[loss=0.08204, simple_loss=0.09774, pruned_loss=0.02472, audio_tagging_loss=0.008452, over 14698.00 frames. ], tot_loss[loss=0.09307, simple_loss=0.1108, pruned_loss=0.02655, audio_tagging_loss=0.01109, over 2956012.48 frames. ], batch size: 57, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:27:59,271 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:28:02,469 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:28:03,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=565813.3333333334, ans=0.125 2023-11-19 04:28:10,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.560e+01 9.295e+01 1.024e+02 1.604e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 04:28:39,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566013.3333333334, ans=0.1 2023-11-19 04:28:39,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.00 vs. limit=22.5 2023-11-19 04:28:42,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.82 vs. limit=12.0 2023-11-19 04:28:47,738 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 750, loss[loss=0.07646, simple_loss=0.08997, pruned_loss=0.0185, audio_tagging_loss=0.01297, over 15551.00 frames. ], tot_loss[loss=0.09312, simple_loss=0.111, pruned_loss=0.02658, audio_tagging_loss=0.01104, over 2986137.24 frames. ], batch size: 59, lr: 8.93e-03, grad_scale: 16.0 2023-11-19 04:28:56,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=566080.0, ans=0.125 2023-11-19 04:29:08,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=566213.3333333334, ans=0.025 2023-11-19 04:29:16,765 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.38 vs. limit=22.5 2023-11-19 04:29:42,383 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 800, loss[loss=0.1067, simple_loss=0.1233, pruned_loss=0.03296, audio_tagging_loss=0.01207, over 16272.00 frames. ], tot_loss[loss=0.09259, simple_loss=0.1106, pruned_loss=0.02632, audio_tagging_loss=0.01098, over 3005652.56 frames. ], batch size: 58, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:29:52,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=566480.0, ans=0.2 2023-11-19 04:29:54,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=566480.0, ans=0.0 2023-11-19 04:29:58,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=566480.0, ans=0.125 2023-11-19 04:30:01,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=566480.0, ans=0.125 2023-11-19 04:30:02,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.557e+01 9.435e+01 1.048e+02 1.522e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 04:30:31,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566680.0, ans=0.1 2023-11-19 04:30:33,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=566680.0, ans=0.0 2023-11-19 04:30:35,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=566680.0, ans=0.2 2023-11-19 04:30:38,427 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 850, loss[loss=0.09096, simple_loss=0.1117, pruned_loss=0.02563, audio_tagging_loss=0.009486, over 15821.00 frames. ], tot_loss[loss=0.09309, simple_loss=0.1114, pruned_loss=0.02646, audio_tagging_loss=0.01093, over 3014244.78 frames. ], batch size: 58, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:30:42,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-11-19 04:30:57,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.92 vs. limit=15.0 2023-11-19 04:31:01,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=566880.0, ans=0.125 2023-11-19 04:31:05,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=566880.0, ans=0.05 2023-11-19 04:31:19,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=566946.6666666666, ans=0.1 2023-11-19 04:31:21,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=567013.3333333334, ans=0.2 2023-11-19 04:31:29,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=567013.3333333334, ans=0.0 2023-11-19 04:31:34,305 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 900, loss[loss=0.07335, simple_loss=0.08908, pruned_loss=0.01799, audio_tagging_loss=0.01082, over 14701.00 frames. ], tot_loss[loss=0.09258, simple_loss=0.1103, pruned_loss=0.02632, audio_tagging_loss=0.01108, over 3017596.17 frames. ], batch size: 56, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:31:34,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=567080.0, ans=0.125 2023-11-19 04:31:36,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=567080.0, ans=0.0 2023-11-19 04:31:48,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=567146.6666666666, ans=0.95 2023-11-19 04:31:53,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.468e+01 9.134e+01 1.003e+02 1.510e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 04:32:00,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=567213.3333333334, ans=0.0 2023-11-19 04:32:19,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=567346.6666666666, ans=0.125 2023-11-19 04:32:26,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=567346.6666666666, ans=0.1 2023-11-19 04:32:29,739 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 950, loss[loss=0.09071, simple_loss=0.112, pruned_loss=0.0255, audio_tagging_loss=0.00919, over 15127.00 frames. ], tot_loss[loss=0.09249, simple_loss=0.1107, pruned_loss=0.02616, audio_tagging_loss=0.01096, over 3028649.39 frames. ], batch size: 55, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:32:34,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=567413.3333333334, ans=0.125 2023-11-19 04:32:36,306 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:32:50,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=567480.0, ans=0.125 2023-11-19 04:32:51,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=567546.6666666666, ans=0.125 2023-11-19 04:33:25,114 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1000, loss[loss=0.09231, simple_loss=0.1207, pruned_loss=0.02458, audio_tagging_loss=0.007385, over 15423.00 frames. ], tot_loss[loss=0.09112, simple_loss=0.1095, pruned_loss=0.0257, audio_tagging_loss=0.01068, over 3035265.39 frames. ], batch size: 58, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:33:25,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=567746.6666666666, ans=0.0 2023-11-19 04:33:26,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=567746.6666666666, ans=0.125 2023-11-19 04:33:32,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=567746.6666666666, ans=0.015 2023-11-19 04:33:46,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=567813.3333333334, ans=0.125 2023-11-19 04:33:47,007 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.670e+01 9.529e+01 1.041e+02 1.429e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-19 04:33:49,211 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:34:04,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=567946.6666666666, ans=0.0 2023-11-19 04:34:21,674 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1050, loss[loss=0.07735, simple_loss=0.08769, pruned_loss=0.02149, audio_tagging_loss=0.01201, over 14496.00 frames. ], tot_loss[loss=0.09122, simple_loss=0.1093, pruned_loss=0.02591, audio_tagging_loss=0.01065, over 3035979.37 frames. ], batch size: 55, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:34:27,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=568080.0, ans=0.5 2023-11-19 04:34:34,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=568146.6666666666, ans=0.125 2023-11-19 04:34:48,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=568213.3333333334, ans=0.125 2023-11-19 04:34:53,895 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=568280.0, ans=0.0 2023-11-19 04:35:17,066 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1100, loss[loss=0.09333, simple_loss=0.1215, pruned_loss=0.02263, audio_tagging_loss=0.009967, over 15311.00 frames. ], tot_loss[loss=0.0904, simple_loss=0.1081, pruned_loss=0.02562, audio_tagging_loss=0.01075, over 3028191.54 frames. ], batch size: 54, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:35:19,231 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:35:22,988 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-19 04:35:38,089 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.818e+01 9.664e+01 1.074e+02 1.667e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-19 04:35:40,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=568546.6666666666, ans=0.125 2023-11-19 04:35:43,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=568546.6666666666, ans=0.125 2023-11-19 04:35:53,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=568613.3333333334, ans=0.125 2023-11-19 04:35:54,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2023-11-19 04:36:12,562 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1150, loss[loss=0.1114, simple_loss=0.1425, pruned_loss=0.03304, audio_tagging_loss=0.00715, over 15155.00 frames. ], tot_loss[loss=0.08987, simple_loss=0.1073, pruned_loss=0.02548, audio_tagging_loss=0.01072, over 3031141.01 frames. ], batch size: 56, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:36:37,111 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=568880.0, ans=0.125 2023-11-19 04:36:46,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-19 04:36:52,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.74 vs. limit=12.0 2023-11-19 04:36:52,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=568946.6666666666, ans=0.2 2023-11-19 04:36:56,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2023-11-19 04:37:00,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2023-11-19 04:37:08,816 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1200, loss[loss=0.1021, simple_loss=0.1243, pruned_loss=0.02997, audio_tagging_loss=0.009989, over 15845.00 frames. ], tot_loss[loss=0.09097, simple_loss=0.1085, pruned_loss=0.02608, audio_tagging_loss=0.01065, over 3036110.98 frames. ], batch size: 57, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:37:13,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=569080.0, ans=0.125 2023-11-19 04:37:29,360 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.648e+01 9.273e+01 1.051e+02 1.425e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 04:37:36,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=569213.3333333334, ans=0.0 2023-11-19 04:37:44,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=569280.0, ans=0.0 2023-11-19 04:37:56,010 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:38:04,296 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1250, loss[loss=0.09814, simple_loss=0.1213, pruned_loss=0.0268, audio_tagging_loss=0.01069, over 15099.00 frames. ], tot_loss[loss=0.09131, simple_loss=0.1091, pruned_loss=0.02618, audio_tagging_loss=0.0106, over 3037092.97 frames. ], batch size: 55, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:38:10,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=569413.3333333334, ans=0.1 2023-11-19 04:38:16,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=569480.0, ans=0.125 2023-11-19 04:38:21,762 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=15.0 2023-11-19 04:38:26,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=569546.6666666666, ans=0.0 2023-11-19 04:38:37,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-19 04:38:41,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=569613.3333333334, ans=0.2 2023-11-19 04:38:54,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=569680.0, ans=0.125 2023-11-19 04:38:59,916 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1300, loss[loss=0.1242, simple_loss=0.1361, pruned_loss=0.04009, audio_tagging_loss=0.01607, over 15315.00 frames. ], tot_loss[loss=0.09057, simple_loss=0.1085, pruned_loss=0.02575, audio_tagging_loss=0.01058, over 3040204.77 frames. ], batch size: 56, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:39:18,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=569813.3333333334, ans=0.125 2023-11-19 04:39:21,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.412e+01 8.385e+01 9.003e+01 9.844e+01 1.320e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-19 04:39:38,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=569946.6666666666, ans=0.125 2023-11-19 04:39:56,375 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1350, loss[loss=0.07991, simple_loss=0.09588, pruned_loss=0.02172, audio_tagging_loss=0.01025, over 15684.00 frames. ], tot_loss[loss=0.09038, simple_loss=0.1084, pruned_loss=0.02565, audio_tagging_loss=0.01052, over 3040878.89 frames. ], batch size: 57, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:40:17,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=570213.3333333334, ans=0.0 2023-11-19 04:40:19,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=570213.3333333334, ans=0.0 2023-11-19 04:40:22,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=570213.3333333334, ans=0.2 2023-11-19 04:40:27,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=570213.3333333334, ans=0.05 2023-11-19 04:40:33,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=570280.0, ans=0.0 2023-11-19 04:40:36,361 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:40:51,808 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1400, loss[loss=0.08126, simple_loss=0.1027, pruned_loss=0.02127, audio_tagging_loss=0.008639, over 14722.00 frames. ], tot_loss[loss=0.09058, simple_loss=0.1086, pruned_loss=0.02566, audio_tagging_loss=0.01063, over 3042026.08 frames. ], batch size: 53, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:40:52,481 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-19 04:40:53,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-19 04:40:54,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=570413.3333333334, ans=0.5 2023-11-19 04:41:11,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=570480.0, ans=0.0 2023-11-19 04:41:13,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.757e+01 9.593e+01 1.066e+02 1.571e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-19 04:41:13,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=570546.6666666666, ans=0.0 2023-11-19 04:41:21,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=570546.6666666666, ans=0.09899494936611666 2023-11-19 04:41:31,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.60 vs. limit=6.0 2023-11-19 04:41:37,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=570680.0, ans=10.0 2023-11-19 04:41:47,462 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1450, loss[loss=0.1005, simple_loss=0.1261, pruned_loss=0.02915, audio_tagging_loss=0.008333, over 15371.00 frames. ], tot_loss[loss=0.09194, simple_loss=0.1104, pruned_loss=0.02594, audio_tagging_loss=0.01079, over 3046629.08 frames. ], batch size: 55, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:42:02,473 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-19 04:42:04,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=570813.3333333334, ans=0.2 2023-11-19 04:42:17,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.45 vs. limit=10.0 2023-11-19 04:42:19,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=570946.6666666666, ans=0.1 2023-11-19 04:42:20,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=570946.6666666666, ans=0.125 2023-11-19 04:42:30,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=570946.6666666666, ans=0.0 2023-11-19 04:42:41,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=571013.3333333334, ans=0.0 2023-11-19 04:42:43,688 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1500, loss[loss=0.09806, simple_loss=0.1116, pruned_loss=0.02887, audio_tagging_loss=0.0134, over 14382.00 frames. ], tot_loss[loss=0.09234, simple_loss=0.1109, pruned_loss=0.02614, audio_tagging_loss=0.01074, over 3049116.53 frames. ], batch size: 55, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:42:59,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=571146.6666666666, ans=0.125 2023-11-19 04:43:04,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.390e+01 9.200e+01 9.780e+01 1.571e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-19 04:43:21,321 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2023-11-19 04:43:29,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=571346.6666666666, ans=0.2 2023-11-19 04:43:39,282 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1550, loss[loss=0.08712, simple_loss=0.102, pruned_loss=0.02592, audio_tagging_loss=0.0102, over 16550.00 frames. ], tot_loss[loss=0.09269, simple_loss=0.1114, pruned_loss=0.02626, audio_tagging_loss=0.01073, over 3054717.19 frames. ], batch size: 64, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:43:43,073 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2023-11-19 04:43:44,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=571413.3333333334, ans=0.1 2023-11-19 04:44:02,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=571546.6666666666, ans=0.2 2023-11-19 04:44:34,453 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1600, loss[loss=0.08184, simple_loss=0.09328, pruned_loss=0.02573, audio_tagging_loss=0.009477, over 16313.00 frames. ], tot_loss[loss=0.09274, simple_loss=0.1107, pruned_loss=0.02651, audio_tagging_loss=0.01089, over 3058362.84 frames. ], batch size: 64, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:44:35,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=571746.6666666666, ans=0.125 2023-11-19 04:44:45,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571813.3333333334, ans=0.1 2023-11-19 04:44:49,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=571813.3333333334, ans=0.125 2023-11-19 04:44:49,730 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2023-11-19 04:44:56,105 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.975e+01 9.863e+01 1.094e+02 1.850e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-19 04:45:04,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571880.0, ans=0.1 2023-11-19 04:45:18,366 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-19 04:45:31,003 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1650, loss[loss=0.1073, simple_loss=0.1274, pruned_loss=0.03617, audio_tagging_loss=0.007448, over 15601.00 frames. ], tot_loss[loss=0.09192, simple_loss=0.1098, pruned_loss=0.02607, audio_tagging_loss=0.01097, over 3055643.70 frames. ], batch size: 58, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:45:35,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=572080.0, ans=0.125 2023-11-19 04:45:52,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=572213.3333333334, ans=0.0 2023-11-19 04:46:11,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=572280.0, ans=0.2 2023-11-19 04:46:16,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=572346.6666666666, ans=0.0 2023-11-19 04:46:20,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=572346.6666666666, ans=0.1 2023-11-19 04:46:26,833 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1700, loss[loss=0.1024, simple_loss=0.127, pruned_loss=0.03108, audio_tagging_loss=0.007889, over 14783.00 frames. ], tot_loss[loss=0.09169, simple_loss=0.1097, pruned_loss=0.0258, audio_tagging_loss=0.01104, over 3049256.03 frames. ], batch size: 56, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:46:29,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=572413.3333333334, ans=0.125 2023-11-19 04:46:38,908 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.02 vs. limit=15.0 2023-11-19 04:46:47,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.408e+01 9.171e+01 1.022e+02 1.332e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-19 04:47:00,455 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2023-11-19 04:47:21,125 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1750, loss[loss=0.08437, simple_loss=0.1024, pruned_loss=0.02511, audio_tagging_loss=0.008076, over 15220.00 frames. ], tot_loss[loss=0.09095, simple_loss=0.1086, pruned_loss=0.02564, audio_tagging_loss=0.011, over 3040350.19 frames. ], batch size: 56, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:47:24,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.68 vs. limit=22.5 2023-11-19 04:47:34,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=572813.3333333334, ans=0.2 2023-11-19 04:47:41,441 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=572813.3333333334, ans=0.125 2023-11-19 04:47:52,010 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=572880.0, ans=0.125 2023-11-19 04:47:56,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=572946.6666666666, ans=0.0 2023-11-19 04:48:01,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=572946.6666666666, ans=0.0 2023-11-19 04:48:11,958 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-19 04:48:12,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=573013.3333333334, ans=0.09899494936611666 2023-11-19 04:48:17,892 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1800, loss[loss=0.1125, simple_loss=0.1315, pruned_loss=0.03594, audio_tagging_loss=0.01078, over 15356.00 frames. ], tot_loss[loss=0.09127, simple_loss=0.1091, pruned_loss=0.02587, audio_tagging_loss=0.01087, over 3041748.33 frames. ], batch size: 56, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:48:37,666 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=573146.6666666666, ans=0.09899494936611666 2023-11-19 04:48:38,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.476e+01 9.222e+01 1.009e+02 1.227e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 04:48:41,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=573213.3333333334, ans=0.0 2023-11-19 04:48:43,069 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.29 vs. limit=10.0 2023-11-19 04:48:44,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=573213.3333333334, ans=0.1 2023-11-19 04:49:03,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=573346.6666666666, ans=0.0 2023-11-19 04:49:13,825 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1850, loss[loss=0.09927, simple_loss=0.1178, pruned_loss=0.03336, audio_tagging_loss=0.007002, over 15307.00 frames. ], tot_loss[loss=0.09193, simple_loss=0.11, pruned_loss=0.02619, audio_tagging_loss=0.01072, over 3047388.90 frames. ], batch size: 56, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:49:28,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=573480.0, ans=0.125 2023-11-19 04:49:29,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=573480.0, ans=0.0 2023-11-19 04:49:58,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=573680.0, ans=0.2 2023-11-19 04:50:01,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2023-11-19 04:50:09,106 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1900, loss[loss=0.09979, simple_loss=0.1209, pruned_loss=0.02834, audio_tagging_loss=0.01101, over 16806.00 frames. ], tot_loss[loss=0.09141, simple_loss=0.1093, pruned_loss=0.02606, audio_tagging_loss=0.01072, over 3051026.46 frames. ], batch size: 60, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:50:16,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=573746.6666666666, ans=0.02 2023-11-19 04:50:20,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=573813.3333333334, ans=0.04949747468305833 2023-11-19 04:50:25,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=573813.3333333334, ans=0.125 2023-11-19 04:50:31,202 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.643e+01 9.371e+01 1.051e+02 1.561e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 04:50:32,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=573880.0, ans=0.0 2023-11-19 04:50:38,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=573880.0, ans=0.0 2023-11-19 04:50:46,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=12.0 2023-11-19 04:50:48,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=573946.6666666666, ans=0.04949747468305833 2023-11-19 04:50:48,868 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2023-11-19 04:50:53,003 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-19 04:51:05,276 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 1950, loss[loss=0.07914, simple_loss=0.09412, pruned_loss=0.01835, audio_tagging_loss=0.01373, over 14934.00 frames. ], tot_loss[loss=0.09036, simple_loss=0.1081, pruned_loss=0.02558, audio_tagging_loss=0.01076, over 3051031.67 frames. ], batch size: 57, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:51:07,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=574080.0, ans=0.125 2023-11-19 04:51:16,598 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2023-11-19 04:51:18,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=574146.6666666666, ans=0.2 2023-11-19 04:51:25,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=574146.6666666666, ans=0.0 2023-11-19 04:51:30,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574213.3333333334, ans=0.1 2023-11-19 04:51:39,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=574280.0, ans=0.125 2023-11-19 04:51:41,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=574280.0, ans=0.0 2023-11-19 04:51:43,043 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:51:59,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-11-19 04:51:59,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=574346.6666666666, ans=22.5 2023-11-19 04:52:01,535 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2000, loss[loss=0.07568, simple_loss=0.0758, pruned_loss=0.02625, audio_tagging_loss=0.01154, over 13975.00 frames. ], tot_loss[loss=0.09019, simple_loss=0.1078, pruned_loss=0.02565, audio_tagging_loss=0.01062, over 3052206.92 frames. ], batch size: 53, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:52:01,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=574413.3333333334, ans=0.125 2023-11-19 04:52:06,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=574413.3333333334, ans=0.1 2023-11-19 04:52:21,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.904e+01 9.748e+01 1.142e+02 1.614e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 04:52:37,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=574613.3333333334, ans=0.0 2023-11-19 04:52:48,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=574680.0, ans=0.0 2023-11-19 04:52:57,085 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2050, loss[loss=0.09526, simple_loss=0.1161, pruned_loss=0.02471, audio_tagging_loss=0.01248, over 14309.00 frames. ], tot_loss[loss=0.09012, simple_loss=0.1077, pruned_loss=0.02569, audio_tagging_loss=0.01057, over 3047716.62 frames. ], batch size: 53, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:53:00,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=574746.6666666666, ans=0.0 2023-11-19 04:53:14,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574813.3333333334, ans=0.1 2023-11-19 04:53:36,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=574946.6666666666, ans=0.125 2023-11-19 04:53:51,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=575080.0, ans=0.125 2023-11-19 04:53:52,679 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2100, loss[loss=0.08352, simple_loss=0.1084, pruned_loss=0.01762, audio_tagging_loss=0.01171, over 14538.00 frames. ], tot_loss[loss=0.09001, simple_loss=0.1077, pruned_loss=0.02553, audio_tagging_loss=0.01063, over 3044957.55 frames. ], batch size: 54, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:53:57,739 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575080.0, ans=0.1 2023-11-19 04:54:14,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.268e+01 8.570e+01 9.138e+01 1.001e+02 1.384e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 04:54:18,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=575213.3333333334, ans=0.0 2023-11-19 04:54:31,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=575280.0, ans=0.0 2023-11-19 04:54:41,728 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:54:48,407 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2150, loss[loss=0.08165, simple_loss=0.1005, pruned_loss=0.02027, audio_tagging_loss=0.01114, over 15178.00 frames. ], tot_loss[loss=0.09045, simple_loss=0.1082, pruned_loss=0.02569, audio_tagging_loss=0.01066, over 3045458.03 frames. ], batch size: 56, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:55:04,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.73 vs. limit=12.0 2023-11-19 04:55:05,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=575480.0, ans=0.125 2023-11-19 04:55:09,400 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:55:15,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=575546.6666666666, ans=0.0 2023-11-19 04:55:20,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=575546.6666666666, ans=0.0 2023-11-19 04:55:20,942 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:55:43,557 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2023-11-19 04:55:43,944 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2200, loss[loss=0.08337, simple_loss=0.1018, pruned_loss=0.02305, audio_tagging_loss=0.009405, over 15082.00 frames. ], tot_loss[loss=0.09096, simple_loss=0.1087, pruned_loss=0.02597, audio_tagging_loss=0.01065, over 3039958.67 frames. ], batch size: 58, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:56:04,955 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.523e+01 9.283e+01 9.995e+01 1.354e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 04:56:15,550 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2023-11-19 04:56:20,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=575946.6666666666, ans=0.2 2023-11-19 04:56:20,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.10 vs. limit=10.0 2023-11-19 04:56:24,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2023-11-19 04:56:39,878 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2250, loss[loss=0.07457, simple_loss=0.0834, pruned_loss=0.0199, audio_tagging_loss=0.01297, over 15007.00 frames. ], tot_loss[loss=0.09123, simple_loss=0.1091, pruned_loss=0.02591, audio_tagging_loss=0.01076, over 3041552.14 frames. ], batch size: 57, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:56:40,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=576080.0, ans=0.0 2023-11-19 04:56:40,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=6.0 2023-11-19 04:56:42,184 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:56:43,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=576080.0, ans=0.09899494936611666 2023-11-19 04:56:45,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=576080.0, ans=0.0 2023-11-19 04:56:51,210 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.82 vs. limit=10.0 2023-11-19 04:57:05,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=22.5 2023-11-19 04:57:07,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=576213.3333333334, ans=0.125 2023-11-19 04:57:13,348 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=576280.0, ans=0.125 2023-11-19 04:57:21,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=576280.0, ans=0.0 2023-11-19 04:57:29,547 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2023-11-19 04:57:30,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=576346.6666666666, ans=0.125 2023-11-19 04:57:35,790 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2300, loss[loss=0.1305, simple_loss=0.1637, pruned_loss=0.0419, audio_tagging_loss=0.006729, over 17221.00 frames. ], tot_loss[loss=0.09154, simple_loss=0.1095, pruned_loss=0.02604, audio_tagging_loss=0.01075, over 3046521.15 frames. ], batch size: 61, lr: 8.85e-03, grad_scale: 16.0 2023-11-19 04:57:38,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=576413.3333333334, ans=0.125 2023-11-19 04:57:38,409 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-11-19 04:57:57,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.335e+01 9.344e+01 1.048e+02 1.433e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-19 04:58:00,100 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2023-11-19 04:58:08,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=576613.3333333334, ans=0.125 2023-11-19 04:58:17,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=576613.3333333334, ans=0.125 2023-11-19 04:58:20,556 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-19 04:58:23,151 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:58:30,956 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2350, loss[loss=0.06396, simple_loss=0.07634, pruned_loss=0.01455, audio_tagging_loss=0.01124, over 14455.00 frames. ], tot_loss[loss=0.09199, simple_loss=0.1098, pruned_loss=0.02632, audio_tagging_loss=0.01075, over 3056632.60 frames. ], batch size: 57, lr: 8.84e-03, grad_scale: 16.0 2023-11-19 04:58:40,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=576813.3333333334, ans=0.1 2023-11-19 04:59:16,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=577013.3333333334, ans=0.0 2023-11-19 04:59:21,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=577013.3333333334, ans=0.2 2023-11-19 04:59:24,294 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=577013.3333333334, ans=0.125 2023-11-19 04:59:26,754 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2400, loss[loss=0.1012, simple_loss=0.125, pruned_loss=0.02594, audio_tagging_loss=0.01278, over 14991.00 frames. ], tot_loss[loss=0.09181, simple_loss=0.1094, pruned_loss=0.02622, audio_tagging_loss=0.01087, over 3054470.19 frames. ], batch size: 57, lr: 8.84e-03, grad_scale: 32.0 2023-11-19 04:59:41,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=577146.6666666666, ans=0.125 2023-11-19 04:59:47,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=577146.6666666666, ans=0.125 2023-11-19 04:59:48,403 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-19 04:59:48,960 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.521e+01 9.139e+01 1.013e+02 1.981e+02, threshold=1.828e+02, percent-clipped=1.0 2023-11-19 05:00:23,037 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2450, loss[loss=0.07205, simple_loss=0.08358, pruned_loss=0.01661, audio_tagging_loss=0.01365, over 14715.00 frames. ], tot_loss[loss=0.09181, simple_loss=0.1094, pruned_loss=0.0262, audio_tagging_loss=0.01091, over 3055846.63 frames. ], batch size: 55, lr: 8.84e-03, grad_scale: 32.0 2023-11-19 05:00:25,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=577413.3333333334, ans=0.125 2023-11-19 05:00:44,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2023-11-19 05:00:47,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=577546.6666666666, ans=0.0 2023-11-19 05:00:57,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=577613.3333333334, ans=0.1 2023-11-19 05:01:08,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=577680.0, ans=0.125 2023-11-19 05:01:16,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=577680.0, ans=0.0 2023-11-19 05:01:18,012 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2500, loss[loss=0.06886, simple_loss=0.08061, pruned_loss=0.0183, audio_tagging_loss=0.01026, over 14856.00 frames. ], tot_loss[loss=0.09157, simple_loss=0.1092, pruned_loss=0.02608, audio_tagging_loss=0.0109, over 3050247.61 frames. ], batch size: 58, lr: 8.84e-03, grad_scale: 16.0 2023-11-19 05:01:22,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=577746.6666666666, ans=0.125 2023-11-19 05:01:27,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=577746.6666666666, ans=0.125 2023-11-19 05:01:41,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.392e+01 9.155e+01 9.880e+01 1.151e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 05:01:42,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=577880.0, ans=0.0 2023-11-19 05:02:13,211 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2550, loss[loss=0.09934, simple_loss=0.1083, pruned_loss=0.03231, audio_tagging_loss=0.01288, over 13583.00 frames. ], tot_loss[loss=0.09064, simple_loss=0.1079, pruned_loss=0.02578, audio_tagging_loss=0.01089, over 3046477.96 frames. ], batch size: 54, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:02:26,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578146.6666666666, ans=0.1 2023-11-19 05:02:41,990 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.69 vs. limit=22.5 2023-11-19 05:02:42,611 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=578213.3333333334, ans=0.125 2023-11-19 05:02:48,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.63 vs. limit=10.0 2023-11-19 05:02:59,138 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2023-11-19 05:03:05,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=578346.6666666666, ans=0.125 2023-11-19 05:03:09,581 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2600, loss[loss=0.09136, simple_loss=0.1158, pruned_loss=0.02649, audio_tagging_loss=0.006978, over 15544.00 frames. ], tot_loss[loss=0.09039, simple_loss=0.1079, pruned_loss=0.02579, audio_tagging_loss=0.01066, over 3045703.20 frames. ], batch size: 57, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:03:32,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.609e+01 9.579e+01 1.039e+02 1.650e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-19 05:03:48,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=578613.3333333334, ans=0.0 2023-11-19 05:03:53,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=578613.3333333334, ans=0.0 2023-11-19 05:03:57,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=578680.0, ans=0.0 2023-11-19 05:04:05,561 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2650, loss[loss=0.1011, simple_loss=0.1248, pruned_loss=0.03098, audio_tagging_loss=0.007723, over 15428.00 frames. ], tot_loss[loss=0.09089, simple_loss=0.1085, pruned_loss=0.02608, audio_tagging_loss=0.01055, over 3049981.30 frames. ], batch size: 56, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:04:14,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=578746.6666666666, ans=0.0 2023-11-19 05:04:23,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=578813.3333333334, ans=0.125 2023-11-19 05:04:33,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=578880.0, ans=0.125 2023-11-19 05:04:44,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.18 vs. limit=10.0 2023-11-19 05:04:45,713 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=578946.6666666666, ans=0.125 2023-11-19 05:05:00,344 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2700, loss[loss=0.07402, simple_loss=0.07457, pruned_loss=0.02471, audio_tagging_loss=0.01202, over 16062.00 frames. ], tot_loss[loss=0.09035, simple_loss=0.1081, pruned_loss=0.02585, audio_tagging_loss=0.01045, over 3057139.41 frames. ], batch size: 62, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:05:09,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=579080.0, ans=0.2 2023-11-19 05:05:09,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=579080.0, ans=0.125 2023-11-19 05:05:13,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=579146.6666666666, ans=0.1 2023-11-19 05:05:18,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=579146.6666666666, ans=0.125 2023-11-19 05:05:24,702 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.403e+01 9.195e+01 1.002e+02 1.372e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 05:05:29,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=579213.3333333334, ans=0.0 2023-11-19 05:05:33,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=579280.0, ans=0.5 2023-11-19 05:05:50,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=579346.6666666666, ans=0.0 2023-11-19 05:05:57,146 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2750, loss[loss=0.0951, simple_loss=0.1129, pruned_loss=0.0268, audio_tagging_loss=0.01186, over 15060.00 frames. ], tot_loss[loss=0.09073, simple_loss=0.109, pruned_loss=0.02587, audio_tagging_loss=0.01038, over 3061285.18 frames. ], batch size: 59, lr: 8.82e-03, grad_scale: 16.0 2023-11-19 05:06:08,510 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=579480.0, ans=0.125 2023-11-19 05:06:21,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=579546.6666666666, ans=0.2 2023-11-19 05:06:44,671 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:06:48,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=579680.0, ans=0.125 2023-11-19 05:06:51,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=579680.0, ans=0.05 2023-11-19 05:06:52,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.66 vs. limit=22.5 2023-11-19 05:06:52,971 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2800, loss[loss=0.09249, simple_loss=0.1038, pruned_loss=0.02787, audio_tagging_loss=0.01272, over 15444.00 frames. ], tot_loss[loss=0.09024, simple_loss=0.1084, pruned_loss=0.02565, audio_tagging_loss=0.0104, over 3053414.82 frames. ], batch size: 61, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:06:55,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=579746.6666666666, ans=0.125 2023-11-19 05:07:16,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.973e+01 8.959e+01 9.930e+01 1.123e+02 1.609e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-19 05:07:18,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=579880.0, ans=0.2 2023-11-19 05:07:19,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579880.0, ans=0.125 2023-11-19 05:07:29,515 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2023-11-19 05:07:30,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=579946.6666666666, ans=0.04949747468305833 2023-11-19 05:07:37,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=12.0 2023-11-19 05:07:38,880 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:07:48,250 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2850, loss[loss=0.08157, simple_loss=0.09765, pruned_loss=0.02282, audio_tagging_loss=0.009929, over 15286.00 frames. ], tot_loss[loss=0.09065, simple_loss=0.1089, pruned_loss=0.0258, audio_tagging_loss=0.01039, over 3054753.58 frames. ], batch size: 60, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:07:51,870 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=580080.0, ans=0.125 2023-11-19 05:07:55,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=580080.0, ans=0.0 2023-11-19 05:08:24,985 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=12.0 2023-11-19 05:08:26,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=580280.0, ans=10.0 2023-11-19 05:08:27,927 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=580280.0, ans=0.0 2023-11-19 05:08:45,318 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2900, loss[loss=0.0994, simple_loss=0.1075, pruned_loss=0.03376, audio_tagging_loss=0.01191, over 14051.00 frames. ], tot_loss[loss=0.09061, simple_loss=0.1084, pruned_loss=0.02588, audio_tagging_loss=0.01053, over 3047065.28 frames. ], batch size: 55, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:09:08,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.526e+01 9.240e+01 1.001e+02 1.332e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 05:09:22,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=580613.3333333334, ans=0.125 2023-11-19 05:09:41,482 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 2950, loss[loss=0.08241, simple_loss=0.1006, pruned_loss=0.02084, audio_tagging_loss=0.01127, over 15646.00 frames. ], tot_loss[loss=0.09134, simple_loss=0.1096, pruned_loss=0.02595, audio_tagging_loss=0.01061, over 3047082.00 frames. ], batch size: 57, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:10:20,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=580946.6666666666, ans=0.02 2023-11-19 05:10:23,966 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2023-11-19 05:10:24,473 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=580946.6666666666, ans=0.125 2023-11-19 05:10:34,202 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.14 vs. limit=22.5 2023-11-19 05:10:36,736 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3000, loss[loss=0.06367, simple_loss=0.07651, pruned_loss=0.01354, audio_tagging_loss=0.01188, over 15107.00 frames. ], tot_loss[loss=0.09153, simple_loss=0.1097, pruned_loss=0.02599, audio_tagging_loss=0.01072, over 3047110.14 frames. ], batch size: 59, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:10:36,736 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 05:11:08,545 INFO [train_asr.py:1147] (3/4) Epoch 8, validation: loss=0.06637, simple_loss=0.05694, pruned_loss=0.00724, audio_tagging_loss=0.03066, over 4681554.00 frames. 2023-11-19 05:11:08,546 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 05:11:20,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.87 vs. limit=22.5 2023-11-19 05:11:30,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=581213.3333333334, ans=0.125 2023-11-19 05:11:31,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.419e+01 9.133e+01 9.790e+01 1.190e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 05:11:32,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=581213.3333333334, ans=0.0 2023-11-19 05:11:54,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=581346.6666666666, ans=15.0 2023-11-19 05:11:54,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-11-19 05:12:04,543 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3050, loss[loss=0.06427, simple_loss=0.07512, pruned_loss=0.01678, audio_tagging_loss=0.009931, over 15650.00 frames. ], tot_loss[loss=0.09174, simple_loss=0.1099, pruned_loss=0.02611, audio_tagging_loss=0.01068, over 3044247.17 frames. ], batch size: 60, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:12:04,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=581413.3333333334, ans=0.125 2023-11-19 05:12:29,186 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.531e-03 2023-11-19 05:12:36,527 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:12:51,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=581680.0, ans=0.125 2023-11-19 05:12:58,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=581680.0, ans=0.1 2023-11-19 05:12:59,882 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3100, loss[loss=0.08197, simple_loss=0.09953, pruned_loss=0.02218, audio_tagging_loss=0.01003, over 15411.00 frames. ], tot_loss[loss=0.09198, simple_loss=0.11, pruned_loss=0.02623, audio_tagging_loss=0.01076, over 3042418.21 frames. ], batch size: 58, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:13:02,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.43 vs. limit=8.0 2023-11-19 05:13:24,919 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.767e+01 9.718e+01 1.090e+02 1.747e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-19 05:13:31,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=581880.0, ans=0.125 2023-11-19 05:13:38,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-19 05:13:51,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=582013.3333333334, ans=0.125 2023-11-19 05:13:51,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=582013.3333333334, ans=0.0 2023-11-19 05:13:55,665 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3150, loss[loss=0.07964, simple_loss=0.09492, pruned_loss=0.02055, audio_tagging_loss=0.01163, over 14471.00 frames. ], tot_loss[loss=0.09181, simple_loss=0.1099, pruned_loss=0.02604, audio_tagging_loss=0.01081, over 3047649.45 frames. ], batch size: 57, lr: 8.80e-03, grad_scale: 16.0 2023-11-19 05:13:56,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=582080.0, ans=0.05 2023-11-19 05:14:44,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=582346.6666666666, ans=0.125 2023-11-19 05:14:48,130 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.38 vs. limit=22.5 2023-11-19 05:14:51,789 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3200, loss[loss=0.07675, simple_loss=0.09394, pruned_loss=0.01867, audio_tagging_loss=0.01111, over 14901.00 frames. ], tot_loss[loss=0.0906, simple_loss=0.1084, pruned_loss=0.02551, audio_tagging_loss=0.01091, over 3048045.80 frames. ], batch size: 55, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:15:14,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=582546.6666666666, ans=0.0 2023-11-19 05:15:15,768 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.369e+01 9.165e+01 1.003e+02 1.359e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 05:15:17,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=582546.6666666666, ans=0.125 2023-11-19 05:15:47,234 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3250, loss[loss=0.09247, simple_loss=0.1175, pruned_loss=0.02335, audio_tagging_loss=0.01039, over 16547.00 frames. ], tot_loss[loss=0.09126, simple_loss=0.1093, pruned_loss=0.02568, audio_tagging_loss=0.01095, over 3051414.13 frames. ], batch size: 61, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:15:52,913 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=12.0 2023-11-19 05:15:56,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=582746.6666666666, ans=0.2 2023-11-19 05:16:22,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=582946.6666666666, ans=0.2 2023-11-19 05:16:24,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=582946.6666666666, ans=0.1 2023-11-19 05:16:26,778 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:16:30,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2023-11-19 05:16:39,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=583013.3333333334, ans=0.0 2023-11-19 05:16:42,947 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3300, loss[loss=0.08626, simple_loss=0.09919, pruned_loss=0.02574, audio_tagging_loss=0.01092, over 15164.00 frames. ], tot_loss[loss=0.09164, simple_loss=0.1095, pruned_loss=0.02583, audio_tagging_loss=0.01106, over 3055434.24 frames. ], batch size: 58, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:16:47,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=583080.0, ans=0.04949747468305833 2023-11-19 05:16:55,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=583146.6666666666, ans=0.07 2023-11-19 05:17:02,820 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2023-11-19 05:17:03,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-11-19 05:17:03,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-11-19 05:17:08,699 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.571e+01 9.137e+01 1.011e+02 1.807e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 05:17:11,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=583213.3333333334, ans=0.125 2023-11-19 05:17:28,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=583346.6666666666, ans=0.125 2023-11-19 05:17:38,944 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3350, loss[loss=0.08159, simple_loss=0.09091, pruned_loss=0.02211, audio_tagging_loss=0.01402, over 15845.00 frames. ], tot_loss[loss=0.09143, simple_loss=0.1092, pruned_loss=0.02589, audio_tagging_loss=0.01095, over 3061924.62 frames. ], batch size: 60, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:18:18,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=583613.3333333334, ans=0.0 2023-11-19 05:18:28,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=583680.0, ans=0.125 2023-11-19 05:18:34,536 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3400, loss[loss=0.09487, simple_loss=0.1049, pruned_loss=0.0304, audio_tagging_loss=0.01201, over 15189.00 frames. ], tot_loss[loss=0.09194, simple_loss=0.1102, pruned_loss=0.02616, audio_tagging_loss=0.01067, over 3062089.13 frames. ], batch size: 57, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:18:52,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=583813.3333333334, ans=0.09899494936611666 2023-11-19 05:19:00,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.466e+01 9.111e+01 1.006e+02 1.690e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 05:19:01,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=583880.0, ans=0.125 2023-11-19 05:19:05,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=583880.0, ans=0.07 2023-11-19 05:19:23,922 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=584013.3333333334, ans=0.125 2023-11-19 05:19:25,347 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-11-19 05:19:30,491 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3450, loss[loss=0.1077, simple_loss=0.1386, pruned_loss=0.03107, audio_tagging_loss=0.007331, over 15058.00 frames. ], tot_loss[loss=0.09117, simple_loss=0.1096, pruned_loss=0.02576, audio_tagging_loss=0.01061, over 3055693.73 frames. ], batch size: 56, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:19:35,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=584080.0, ans=0.0 2023-11-19 05:19:37,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584080.0, ans=0.1 2023-11-19 05:19:44,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=584146.6666666666, ans=0.2 2023-11-19 05:19:58,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=584213.3333333334, ans=0.025 2023-11-19 05:20:01,071 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:20:07,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=584280.0, ans=0.125 2023-11-19 05:20:27,026 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3500, loss[loss=0.09951, simple_loss=0.1098, pruned_loss=0.03691, audio_tagging_loss=0.00769, over 15216.00 frames. ], tot_loss[loss=0.09167, simple_loss=0.1108, pruned_loss=0.02592, audio_tagging_loss=0.01037, over 3047602.85 frames. ], batch size: 58, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:20:49,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=584546.6666666666, ans=0.1 2023-11-19 05:20:52,007 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.420e+01 9.270e+01 9.989e+01 2.188e+02, threshold=1.854e+02, percent-clipped=1.0 2023-11-19 05:20:54,689 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:20:55,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2023-11-19 05:21:12,340 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.45 vs. limit=22.5 2023-11-19 05:21:23,082 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3550, loss[loss=0.06059, simple_loss=0.06632, pruned_loss=0.01601, audio_tagging_loss=0.01142, over 13903.00 frames. ], tot_loss[loss=0.09117, simple_loss=0.11, pruned_loss=0.02572, audio_tagging_loss=0.01046, over 3042550.63 frames. ], batch size: 54, lr: 8.78e-03, grad_scale: 16.0 2023-11-19 05:21:27,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=584746.6666666666, ans=0.1 2023-11-19 05:22:19,042 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3600, loss[loss=0.06602, simple_loss=0.06774, pruned_loss=0.0155, audio_tagging_loss=0.01664, over 14162.00 frames. ], tot_loss[loss=0.0908, simple_loss=0.1092, pruned_loss=0.02563, audio_tagging_loss=0.01058, over 3039299.34 frames. ], batch size: 55, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:22:30,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=585146.6666666666, ans=0.2 2023-11-19 05:22:33,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=585146.6666666666, ans=0.125 2023-11-19 05:22:44,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.417e+01 9.022e+01 1.001e+02 1.493e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 05:22:53,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=585280.0, ans=0.0 2023-11-19 05:22:56,134 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2023-11-19 05:23:06,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=585346.6666666666, ans=0.07 2023-11-19 05:23:12,463 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:23:15,472 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3650, loss[loss=0.08857, simple_loss=0.09779, pruned_loss=0.02981, audio_tagging_loss=0.009872, over 15307.00 frames. ], tot_loss[loss=0.09067, simple_loss=0.1089, pruned_loss=0.02566, audio_tagging_loss=0.01055, over 3043061.15 frames. ], batch size: 58, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:23:35,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=585480.0, ans=0.125 2023-11-19 05:23:38,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=585546.6666666666, ans=0.0 2023-11-19 05:23:45,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.97 vs. limit=22.5 2023-11-19 05:23:46,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=585546.6666666666, ans=0.0 2023-11-19 05:23:51,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=585613.3333333334, ans=0.125 2023-11-19 05:24:03,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=585680.0, ans=0.1 2023-11-19 05:24:07,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=585680.0, ans=0.125 2023-11-19 05:24:08,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=585680.0, ans=0.125 2023-11-19 05:24:10,699 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3700, loss[loss=0.06006, simple_loss=0.0649, pruned_loss=0.01672, audio_tagging_loss=0.01089, over 14472.00 frames. ], tot_loss[loss=0.09026, simple_loss=0.1083, pruned_loss=0.02559, audio_tagging_loss=0.0105, over 3045663.13 frames. ], batch size: 57, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:24:13,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585746.6666666666, ans=0.1 2023-11-19 05:24:35,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=585880.0, ans=0.125 2023-11-19 05:24:37,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.524e+01 9.099e+01 9.813e+01 1.282e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 05:24:49,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=585946.6666666666, ans=0.125 2023-11-19 05:25:01,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=586013.3333333334, ans=0.07 2023-11-19 05:25:01,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=586013.3333333334, ans=0.0 2023-11-19 05:25:06,476 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3750, loss[loss=0.08136, simple_loss=0.0914, pruned_loss=0.02312, audio_tagging_loss=0.01254, over 14948.00 frames. ], tot_loss[loss=0.09211, simple_loss=0.1106, pruned_loss=0.02635, audio_tagging_loss=0.01047, over 3054595.33 frames. ], batch size: 55, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:25:15,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=586080.0, ans=0.125 2023-11-19 05:25:15,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=586080.0, ans=0.125 2023-11-19 05:25:44,617 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:25:44,671 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=586280.0, ans=0.125 2023-11-19 05:25:45,938 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:26:03,059 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3800, loss[loss=0.08233, simple_loss=0.09848, pruned_loss=0.02139, audio_tagging_loss=0.0117, over 15061.00 frames. ], tot_loss[loss=0.09093, simple_loss=0.1089, pruned_loss=0.02579, audio_tagging_loss=0.01068, over 3045202.28 frames. ], batch size: 57, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:26:15,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=586480.0, ans=0.2 2023-11-19 05:26:21,251 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:26:27,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.211e+01 8.964e+01 1.013e+02 1.295e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 05:26:35,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=22.5 2023-11-19 05:26:43,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=586613.3333333334, ans=0.95 2023-11-19 05:26:44,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586613.3333333334, ans=0.1 2023-11-19 05:26:50,263 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=586680.0, ans=0.125 2023-11-19 05:26:57,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2023-11-19 05:27:00,381 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3850, loss[loss=0.1062, simple_loss=0.1286, pruned_loss=0.03101, audio_tagging_loss=0.01085, over 16634.00 frames. ], tot_loss[loss=0.09109, simple_loss=0.1089, pruned_loss=0.02583, audio_tagging_loss=0.01079, over 3042021.46 frames. ], batch size: 61, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:27:08,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=586746.6666666666, ans=0.1 2023-11-19 05:27:20,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586813.3333333334, ans=0.1 2023-11-19 05:27:40,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=586946.6666666666, ans=0.0 2023-11-19 05:27:56,350 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3900, loss[loss=0.09878, simple_loss=0.1134, pruned_loss=0.03161, audio_tagging_loss=0.01045, over 15299.00 frames. ], tot_loss[loss=0.09105, simple_loss=0.1088, pruned_loss=0.02579, audio_tagging_loss=0.01087, over 3040079.21 frames. ], batch size: 57, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:27:58,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-11-19 05:28:10,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=587146.6666666666, ans=0.0 2023-11-19 05:28:22,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.532e+01 9.315e+01 9.987e+01 1.482e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 05:28:52,882 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 3950, loss[loss=0.0933, simple_loss=0.1088, pruned_loss=0.02405, audio_tagging_loss=0.01483, over 14762.00 frames. ], tot_loss[loss=0.09094, simple_loss=0.1084, pruned_loss=0.02571, audio_tagging_loss=0.011, over 3036719.79 frames. ], batch size: 54, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:29:17,981 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:29:23,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=587546.6666666666, ans=0.2 2023-11-19 05:29:32,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=587613.3333333334, ans=0.0 2023-11-19 05:29:37,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=587680.0, ans=0.125 2023-11-19 05:29:47,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=587746.6666666666, ans=0.125 2023-11-19 05:29:48,576 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4000, loss[loss=0.0737, simple_loss=0.08078, pruned_loss=0.01663, audio_tagging_loss=0.01668, over 15453.00 frames. ], tot_loss[loss=0.09126, simple_loss=0.1092, pruned_loss=0.02572, audio_tagging_loss=0.01096, over 3050094.29 frames. ], batch size: 60, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:29:49,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=587746.6666666666, ans=0.125 2023-11-19 05:29:58,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=587813.3333333334, ans=0.0 2023-11-19 05:30:00,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=587813.3333333334, ans=0.125 2023-11-19 05:30:14,660 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.556e+01 9.306e+01 1.024e+02 1.346e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-19 05:30:40,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2023-11-19 05:30:44,077 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4050, loss[loss=0.09582, simple_loss=0.1089, pruned_loss=0.02859, audio_tagging_loss=0.01276, over 13830.00 frames. ], tot_loss[loss=0.09191, simple_loss=0.1103, pruned_loss=0.0259, audio_tagging_loss=0.01086, over 3048754.80 frames. ], batch size: 56, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:30:46,198 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:31:15,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=588213.3333333334, ans=0.5 2023-11-19 05:31:19,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=588280.0, ans=0.0 2023-11-19 05:31:21,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=588280.0, ans=0.025 2023-11-19 05:31:30,337 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=15.0 2023-11-19 05:31:34,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=588346.6666666666, ans=0.125 2023-11-19 05:31:40,827 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4100, loss[loss=0.08896, simple_loss=0.1068, pruned_loss=0.02616, audio_tagging_loss=0.009377, over 14398.00 frames. ], tot_loss[loss=0.09133, simple_loss=0.1096, pruned_loss=0.02571, audio_tagging_loss=0.01083, over 3047347.64 frames. ], batch size: 54, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:31:48,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=588413.3333333334, ans=0.2 2023-11-19 05:31:55,419 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:31:56,497 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=588480.0, ans=0.0 2023-11-19 05:32:05,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.777e+01 9.323e+01 1.010e+02 1.338e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 05:32:08,187 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=588546.6666666666, ans=0.0 2023-11-19 05:32:11,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.65 vs. limit=15.0 2023-11-19 05:32:21,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=588613.3333333334, ans=0.0 2023-11-19 05:32:36,888 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4150, loss[loss=0.1078, simple_loss=0.1332, pruned_loss=0.03077, audio_tagging_loss=0.01038, over 14642.00 frames. ], tot_loss[loss=0.09123, simple_loss=0.1095, pruned_loss=0.02577, audio_tagging_loss=0.01071, over 3045980.07 frames. ], batch size: 56, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:32:42,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=588746.6666666666, ans=0.125 2023-11-19 05:33:16,870 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:33:31,685 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4200, loss[loss=0.07609, simple_loss=0.08832, pruned_loss=0.01902, audio_tagging_loss=0.01291, over 14421.00 frames. ], tot_loss[loss=0.09097, simple_loss=0.1091, pruned_loss=0.02576, audio_tagging_loss=0.01067, over 3042460.33 frames. ], batch size: 55, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:33:47,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=589146.6666666666, ans=0.2 2023-11-19 05:33:58,305 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.836e+01 9.609e+01 1.061e+02 1.544e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 05:34:03,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=589213.3333333334, ans=0.0 2023-11-19 05:34:13,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=589280.0, ans=0.125 2023-11-19 05:34:28,182 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4250, loss[loss=0.07242, simple_loss=0.0891, pruned_loss=0.01571, audio_tagging_loss=0.01216, over 14906.00 frames. ], tot_loss[loss=0.09066, simple_loss=0.1089, pruned_loss=0.02557, audio_tagging_loss=0.01064, over 3042169.18 frames. ], batch size: 56, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:34:34,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=589413.3333333334, ans=0.125 2023-11-19 05:34:43,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589480.0, ans=0.1 2023-11-19 05:34:50,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=589546.6666666666, ans=0.0 2023-11-19 05:34:50,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=589546.6666666666, ans=0.0 2023-11-19 05:34:51,150 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2023-11-19 05:35:01,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=589613.3333333334, ans=0.125 2023-11-19 05:35:01,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=589613.3333333334, ans=0.2 2023-11-19 05:35:24,460 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4300, loss[loss=0.09277, simple_loss=0.112, pruned_loss=0.02576, audio_tagging_loss=0.01102, over 14799.00 frames. ], tot_loss[loss=0.09071, simple_loss=0.1094, pruned_loss=0.02554, audio_tagging_loss=0.01047, over 3050395.09 frames. ], batch size: 56, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:35:35,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=589813.3333333334, ans=0.0 2023-11-19 05:35:49,994 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.689e+01 9.452e+01 1.019e+02 1.393e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-19 05:35:56,522 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=589946.6666666666, ans=0.0 2023-11-19 05:36:07,699 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:36:16,067 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:36:18,938 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4350, loss[loss=0.113, simple_loss=0.1362, pruned_loss=0.03537, audio_tagging_loss=0.009532, over 16710.00 frames. ], tot_loss[loss=0.09143, simple_loss=0.1102, pruned_loss=0.02587, audio_tagging_loss=0.01047, over 3045648.65 frames. ], batch size: 60, lr: 8.74e-03, grad_scale: 16.0 2023-11-19 05:36:35,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=590146.6666666666, ans=0.125 2023-11-19 05:36:41,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=590213.3333333334, ans=0.2 2023-11-19 05:36:52,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=590280.0, ans=0.125 2023-11-19 05:36:53,282 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-19 05:37:14,720 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4400, loss[loss=0.0916, simple_loss=0.1189, pruned_loss=0.02296, audio_tagging_loss=0.009193, over 16507.00 frames. ], tot_loss[loss=0.09168, simple_loss=0.1104, pruned_loss=0.02597, audio_tagging_loss=0.01051, over 3041334.51 frames. ], batch size: 61, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:37:15,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590413.3333333334, ans=0.1 2023-11-19 05:37:41,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.028e+01 8.473e+01 9.223e+01 1.006e+02 1.257e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 05:37:43,880 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.27 vs. limit=10.0 2023-11-19 05:37:58,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.98 vs. limit=6.0 2023-11-19 05:38:11,086 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4450, loss[loss=0.07208, simple_loss=0.09057, pruned_loss=0.01768, audio_tagging_loss=0.00911, over 15259.00 frames. ], tot_loss[loss=0.09139, simple_loss=0.11, pruned_loss=0.02583, audio_tagging_loss=0.01055, over 3047007.02 frames. ], batch size: 55, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:38:14,439 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=590746.6666666666, ans=0.125 2023-11-19 05:38:16,672 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=590746.6666666666, ans=0.05 2023-11-19 05:38:18,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=590746.6666666666, ans=0.0 2023-11-19 05:38:43,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590946.6666666666, ans=0.1 2023-11-19 05:38:46,695 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=590946.6666666666, ans=0.125 2023-11-19 05:38:57,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=591013.3333333334, ans=0.2 2023-11-19 05:39:03,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=591013.3333333334, ans=0.125 2023-11-19 05:39:06,188 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4500, loss[loss=0.09897, simple_loss=0.1306, pruned_loss=0.02606, audio_tagging_loss=0.007621, over 15231.00 frames. ], tot_loss[loss=0.09124, simple_loss=0.1097, pruned_loss=0.02581, audio_tagging_loss=0.01057, over 3034296.02 frames. ], batch size: 57, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:39:09,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=591080.0, ans=0.125 2023-11-19 05:39:18,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=591146.6666666666, ans=0.0 2023-11-19 05:39:24,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=591146.6666666666, ans=0.2 2023-11-19 05:39:25,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=591146.6666666666, ans=0.125 2023-11-19 05:39:25,439 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:39:33,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.438e+01 9.340e+01 1.045e+02 1.489e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 05:39:47,209 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2023-11-19 05:39:47,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=591280.0, ans=0.04949747468305833 2023-11-19 05:39:53,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=591346.6666666666, ans=0.0 2023-11-19 05:39:54,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=591346.6666666666, ans=0.125 2023-11-19 05:39:54,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2023-11-19 05:40:02,479 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4550, loss[loss=0.08971, simple_loss=0.1115, pruned_loss=0.02374, audio_tagging_loss=0.01022, over 14148.00 frames. ], tot_loss[loss=0.08999, simple_loss=0.108, pruned_loss=0.02528, audio_tagging_loss=0.01069, over 3032395.21 frames. ], batch size: 54, lr: 8.73e-03, grad_scale: 32.0 2023-11-19 05:40:13,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=591480.0, ans=0.125 2023-11-19 05:40:23,402 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.73 vs. limit=15.0 2023-11-19 05:40:38,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=591613.3333333334, ans=0.125 2023-11-19 05:40:44,876 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:40:46,204 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=591680.0, ans=0.0 2023-11-19 05:40:56,152 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=591680.0, ans=0.125 2023-11-19 05:40:58,035 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4600, loss[loss=0.09402, simple_loss=0.1172, pruned_loss=0.0244, audio_tagging_loss=0.011, over 16175.00 frames. ], tot_loss[loss=0.08937, simple_loss=0.107, pruned_loss=0.02505, audio_tagging_loss=0.01082, over 3031977.93 frames. ], batch size: 61, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:41:25,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.649e+01 8.473e+01 9.207e+01 1.048e+02 1.617e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 05:41:27,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=591880.0, ans=0.125 2023-11-19 05:41:29,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2023-11-19 05:41:42,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.42 vs. limit=10.0 2023-11-19 05:41:52,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=592080.0, ans=0.125 2023-11-19 05:41:53,876 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4650, loss[loss=0.1044, simple_loss=0.1182, pruned_loss=0.03521, audio_tagging_loss=0.01012, over 13746.00 frames. ], tot_loss[loss=0.08933, simple_loss=0.107, pruned_loss=0.025, audio_tagging_loss=0.01084, over 3033343.96 frames. ], batch size: 54, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:41:55,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=592080.0, ans=0.125 2023-11-19 05:42:10,091 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=592146.6666666666, ans=0.0 2023-11-19 05:42:35,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=592280.0, ans=0.125 2023-11-19 05:42:35,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-19 05:42:37,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=592346.6666666666, ans=0.0 2023-11-19 05:42:49,075 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4700, loss[loss=0.07854, simple_loss=0.08404, pruned_loss=0.02218, audio_tagging_loss=0.01433, over 16066.00 frames. ], tot_loss[loss=0.08988, simple_loss=0.1077, pruned_loss=0.02517, audio_tagging_loss=0.01084, over 3041656.80 frames. ], batch size: 65, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:43:07,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=592480.0, ans=0.0 2023-11-19 05:43:07,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=592480.0, ans=0.125 2023-11-19 05:43:17,542 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.535e+01 9.340e+01 1.049e+02 1.659e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 05:43:25,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=592613.3333333334, ans=0.0 2023-11-19 05:43:28,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=592613.3333333334, ans=0.125 2023-11-19 05:43:35,675 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=592680.0, ans=0.125 2023-11-19 05:43:45,574 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4750, loss[loss=0.08479, simple_loss=0.106, pruned_loss=0.01953, audio_tagging_loss=0.01226, over 15276.00 frames. ], tot_loss[loss=0.09059, simple_loss=0.1088, pruned_loss=0.02534, audio_tagging_loss=0.01086, over 3037342.26 frames. ], batch size: 56, lr: 8.72e-03, grad_scale: 16.0 2023-11-19 05:44:07,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=592880.0, ans=0.125 2023-11-19 05:44:33,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=593013.3333333334, ans=0.125 2023-11-19 05:44:41,341 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4800, loss[loss=0.08866, simple_loss=0.09903, pruned_loss=0.02472, audio_tagging_loss=0.01443, over 16211.00 frames. ], tot_loss[loss=0.09083, simple_loss=0.1088, pruned_loss=0.0254, audio_tagging_loss=0.01105, over 3044207.68 frames. ], batch size: 59, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:44:47,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=593080.0, ans=0.125 2023-11-19 05:44:48,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=593080.0, ans=0.125 2023-11-19 05:44:52,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=15.0 2023-11-19 05:44:56,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=593146.6666666666, ans=0.0 2023-11-19 05:45:01,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=593146.6666666666, ans=0.0 2023-11-19 05:45:05,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=593213.3333333334, ans=0.125 2023-11-19 05:45:08,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=593213.3333333334, ans=0.0 2023-11-19 05:45:08,999 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.537e+01 9.216e+01 9.797e+01 1.751e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 05:45:15,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.94 vs. limit=10.0 2023-11-19 05:45:34,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=593346.6666666666, ans=0.2 2023-11-19 05:45:36,330 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4850, loss[loss=0.07152, simple_loss=0.08442, pruned_loss=0.01673, audio_tagging_loss=0.01258, over 14750.00 frames. ], tot_loss[loss=0.09086, simple_loss=0.1085, pruned_loss=0.02542, audio_tagging_loss=0.01118, over 3038029.19 frames. ], batch size: 55, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:46:17,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=593613.3333333334, ans=0.125 2023-11-19 05:46:22,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=593680.0, ans=0.2 2023-11-19 05:46:24,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=593680.0, ans=0.1 2023-11-19 05:46:27,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=593680.0, ans=0.02 2023-11-19 05:46:28,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=593680.0, ans=0.125 2023-11-19 05:46:31,475 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4900, loss[loss=0.07853, simple_loss=0.1024, pruned_loss=0.01842, audio_tagging_loss=0.008898, over 15480.00 frames. ], tot_loss[loss=0.09096, simple_loss=0.1088, pruned_loss=0.02557, audio_tagging_loss=0.01097, over 3038592.38 frames. ], batch size: 57, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:46:41,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=593813.3333333334, ans=0.1 2023-11-19 05:46:45,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=593813.3333333334, ans=0.1 2023-11-19 05:46:46,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=593813.3333333334, ans=0.1 2023-11-19 05:46:51,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=593813.3333333334, ans=0.125 2023-11-19 05:46:57,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2023-11-19 05:46:58,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.397e+01 9.302e+01 1.001e+02 1.305e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 05:47:26,551 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 4950, loss[loss=0.1048, simple_loss=0.1405, pruned_loss=0.02682, audio_tagging_loss=0.007734, over 15177.00 frames. ], tot_loss[loss=0.08984, simple_loss=0.1077, pruned_loss=0.02514, audio_tagging_loss=0.01084, over 3041397.91 frames. ], batch size: 57, lr: 8.71e-03, grad_scale: 32.0 2023-11-19 05:48:02,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=594280.0, ans=0.125 2023-11-19 05:48:16,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=594346.6666666666, ans=0.125 2023-11-19 05:48:22,027 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5000, loss[loss=0.1012, simple_loss=0.1298, pruned_loss=0.02629, audio_tagging_loss=0.01001, over 16345.00 frames. ], tot_loss[loss=0.09002, simple_loss=0.1082, pruned_loss=0.02526, audio_tagging_loss=0.01064, over 3044884.52 frames. ], batch size: 59, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:48:22,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=594413.3333333334, ans=0.125 2023-11-19 05:48:34,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=594480.0, ans=0.0 2023-11-19 05:48:51,121 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.368e+01 8.482e+01 9.400e+01 1.026e+02 1.313e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-19 05:49:00,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.45 vs. limit=10.0 2023-11-19 05:49:18,523 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5050, loss[loss=0.05847, simple_loss=0.06652, pruned_loss=0.01213, audio_tagging_loss=0.01307, over 16256.00 frames. ], tot_loss[loss=0.08908, simple_loss=0.1069, pruned_loss=0.025, audio_tagging_loss=0.01062, over 3033639.87 frames. ], batch size: 61, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:49:21,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2023-11-19 05:49:24,050 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.39 vs. limit=15.0 2023-11-19 05:49:31,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=594813.3333333334, ans=0.125 2023-11-19 05:49:40,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=22.5 2023-11-19 05:49:41,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-19 05:50:01,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=594946.6666666666, ans=0.125 2023-11-19 05:50:13,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=595080.0, ans=0.0 2023-11-19 05:50:14,332 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5100, loss[loss=0.09808, simple_loss=0.1182, pruned_loss=0.02656, audio_tagging_loss=0.0124, over 15234.00 frames. ], tot_loss[loss=0.08915, simple_loss=0.1067, pruned_loss=0.02516, audio_tagging_loss=0.01064, over 3038357.85 frames. ], batch size: 56, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:50:16,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=595080.0, ans=0.125 2023-11-19 05:50:40,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=595213.3333333334, ans=0.0 2023-11-19 05:50:43,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.142e+01 8.421e+01 9.048e+01 1.022e+02 1.450e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 05:50:44,175 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:50:53,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=595280.0, ans=0.125 2023-11-19 05:51:02,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=15.0 2023-11-19 05:51:03,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.14 vs. limit=22.5 2023-11-19 05:51:06,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=595346.6666666666, ans=0.0 2023-11-19 05:51:09,263 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5150, loss[loss=0.09271, simple_loss=0.1203, pruned_loss=0.0233, audio_tagging_loss=0.009245, over 15558.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1062, pruned_loss=0.0249, audio_tagging_loss=0.01064, over 3044525.48 frames. ], batch size: 56, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:51:16,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=595413.3333333334, ans=0.125 2023-11-19 05:51:33,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-19 05:51:48,789 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-11-19 05:51:59,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2023-11-19 05:52:00,667 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=595680.0, ans=0.2 2023-11-19 05:52:05,712 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5200, loss[loss=0.07007, simple_loss=0.08065, pruned_loss=0.01868, audio_tagging_loss=0.01107, over 14566.00 frames. ], tot_loss[loss=0.08937, simple_loss=0.107, pruned_loss=0.02525, audio_tagging_loss=0.01063, over 3041775.65 frames. ], batch size: 57, lr: 8.70e-03, grad_scale: 32.0 2023-11-19 05:52:12,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=595746.6666666666, ans=0.125 2023-11-19 05:52:16,113 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=22.5 2023-11-19 05:52:33,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.322e+01 8.934e+01 9.832e+01 1.211e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-19 05:52:33,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=595880.0, ans=0.125 2023-11-19 05:52:46,339 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=15.0 2023-11-19 05:52:55,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=596013.3333333334, ans=0.0 2023-11-19 05:52:59,555 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=596013.3333333334, ans=0.2 2023-11-19 05:53:01,451 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5250, loss[loss=0.08918, simple_loss=0.115, pruned_loss=0.02141, audio_tagging_loss=0.01027, over 15587.00 frames. ], tot_loss[loss=0.0906, simple_loss=0.1087, pruned_loss=0.02571, audio_tagging_loss=0.01055, over 3043640.32 frames. ], batch size: 57, lr: 8.70e-03, grad_scale: 32.0 2023-11-19 05:53:07,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=596080.0, ans=0.09899494936611666 2023-11-19 05:53:10,089 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596080.0, ans=0.1 2023-11-19 05:53:10,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=596080.0, ans=0.2 2023-11-19 05:53:48,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=596346.6666666666, ans=0.05 2023-11-19 05:53:50,074 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=596346.6666666666, ans=10.0 2023-11-19 05:53:56,337 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5300, loss[loss=0.07846, simple_loss=0.0941, pruned_loss=0.01818, audio_tagging_loss=0.01323, over 16293.00 frames. ], tot_loss[loss=0.09097, simple_loss=0.1096, pruned_loss=0.0257, audio_tagging_loss=0.01047, over 3034831.92 frames. ], batch size: 62, lr: 8.70e-03, grad_scale: 16.0 2023-11-19 05:53:57,712 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=596413.3333333334, ans=0.0 2023-11-19 05:54:05,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=596413.3333333334, ans=0.0 2023-11-19 05:54:12,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2023-11-19 05:54:23,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.51 vs. limit=15.0 2023-11-19 05:54:25,146 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=596546.6666666666, ans=0.125 2023-11-19 05:54:26,891 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.841e+01 9.894e+01 1.112e+02 1.416e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-19 05:54:28,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=596546.6666666666, ans=0.125 2023-11-19 05:54:52,769 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5350, loss[loss=0.06796, simple_loss=0.08379, pruned_loss=0.01734, audio_tagging_loss=0.008723, over 14415.00 frames. ], tot_loss[loss=0.09085, simple_loss=0.1096, pruned_loss=0.02556, audio_tagging_loss=0.01051, over 3040361.85 frames. ], batch size: 54, lr: 8.70e-03, grad_scale: 16.0 2023-11-19 05:54:56,105 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=596746.6666666666, ans=0.125 2023-11-19 05:55:05,217 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=596813.3333333334, ans=0.2 2023-11-19 05:55:14,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=596880.0, ans=0.125 2023-11-19 05:55:15,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=596880.0, ans=0.125 2023-11-19 05:55:20,164 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:55:48,445 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5400, loss[loss=0.09727, simple_loss=0.1095, pruned_loss=0.03053, audio_tagging_loss=0.01197, over 14182.00 frames. ], tot_loss[loss=0.0913, simple_loss=0.1099, pruned_loss=0.02575, audio_tagging_loss=0.01062, over 3041981.15 frames. ], batch size: 57, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:55:50,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=597080.0, ans=0.0 2023-11-19 05:55:54,006 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=597080.0, ans=0.2 2023-11-19 05:56:03,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=597146.6666666666, ans=0.0 2023-11-19 05:56:11,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=597213.3333333334, ans=0.0 2023-11-19 05:56:13,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=597213.3333333334, ans=0.05 2023-11-19 05:56:17,854 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.569e+01 9.325e+01 1.031e+02 1.430e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 05:56:22,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=597280.0, ans=0.0 2023-11-19 05:56:22,535 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=597280.0, ans=0.125 2023-11-19 05:56:24,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-11-19 05:56:28,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=597280.0, ans=0.0 2023-11-19 05:56:33,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=597346.6666666666, ans=0.0 2023-11-19 05:56:37,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2023-11-19 05:56:43,656 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5450, loss[loss=0.109, simple_loss=0.1281, pruned_loss=0.03257, audio_tagging_loss=0.01234, over 14610.00 frames. ], tot_loss[loss=0.09225, simple_loss=0.1107, pruned_loss=0.02624, audio_tagging_loss=0.01065, over 3043337.11 frames. ], batch size: 55, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:57:10,959 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-11-19 05:57:20,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2023-11-19 05:57:26,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=597613.3333333334, ans=0.1 2023-11-19 05:57:30,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=597680.0, ans=10.0 2023-11-19 05:57:39,855 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5500, loss[loss=0.08461, simple_loss=0.1026, pruned_loss=0.02103, audio_tagging_loss=0.01227, over 15323.00 frames. ], tot_loss[loss=0.0917, simple_loss=0.1098, pruned_loss=0.02609, audio_tagging_loss=0.01069, over 3044165.86 frames. ], batch size: 59, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:57:41,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=597746.6666666666, ans=0.0 2023-11-19 05:57:45,593 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2023-11-19 05:57:52,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.38 vs. limit=22.5 2023-11-19 05:57:52,817 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:58:09,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.477e+01 9.706e+01 1.076e+02 1.326e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-19 05:58:27,087 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-19 05:58:28,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-19 05:58:35,506 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5550, loss[loss=0.1124, simple_loss=0.1298, pruned_loss=0.03466, audio_tagging_loss=0.01285, over 16433.00 frames. ], tot_loss[loss=0.09197, simple_loss=0.11, pruned_loss=0.02614, audio_tagging_loss=0.01083, over 3044073.33 frames. ], batch size: 63, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:59:01,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-19 05:59:22,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598346.6666666666, ans=0.1 2023-11-19 05:59:30,944 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5600, loss[loss=0.09776, simple_loss=0.1185, pruned_loss=0.02748, audio_tagging_loss=0.01102, over 15639.00 frames. ], tot_loss[loss=0.09222, simple_loss=0.1108, pruned_loss=0.02603, audio_tagging_loss=0.01081, over 3049971.47 frames. ], batch size: 58, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 05:59:43,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=598480.0, ans=0.0 2023-11-19 05:59:48,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598480.0, ans=0.1 2023-11-19 05:59:48,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=598480.0, ans=0.2 2023-11-19 05:59:48,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=598480.0, ans=0.1 2023-11-19 05:59:56,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=598546.6666666666, ans=0.2 2023-11-19 06:00:01,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=598546.6666666666, ans=0.0 2023-11-19 06:00:02,416 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.552e+01 9.221e+01 1.020e+02 1.317e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 06:00:10,778 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:00:17,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=598680.0, ans=0.0 2023-11-19 06:00:20,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-19 06:00:22,966 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=598680.0, ans=0.1 2023-11-19 06:00:27,008 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5650, loss[loss=0.112, simple_loss=0.1443, pruned_loss=0.03131, audio_tagging_loss=0.008549, over 15972.00 frames. ], tot_loss[loss=0.09244, simple_loss=0.111, pruned_loss=0.026, audio_tagging_loss=0.01094, over 3051679.84 frames. ], batch size: 57, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:00:28,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=598746.6666666666, ans=0.2 2023-11-19 06:00:33,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=598746.6666666666, ans=0.0 2023-11-19 06:01:04,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=598946.6666666666, ans=0.125 2023-11-19 06:01:12,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=599013.3333333334, ans=0.125 2023-11-19 06:01:21,707 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=599080.0, ans=0.125 2023-11-19 06:01:22,499 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5700, loss[loss=0.09449, simple_loss=0.1269, pruned_loss=0.02364, audio_tagging_loss=0.007423, over 15304.00 frames. ], tot_loss[loss=0.09192, simple_loss=0.1105, pruned_loss=0.02587, audio_tagging_loss=0.01082, over 3049553.17 frames. ], batch size: 56, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:01:24,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=599080.0, ans=0.0 2023-11-19 06:01:26,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=599080.0, ans=0.035 2023-11-19 06:01:30,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2023-11-19 06:01:53,408 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.633e+01 8.957e+01 9.901e+01 1.097e+02 1.583e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-19 06:01:57,896 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=599280.0, ans=0.025 2023-11-19 06:02:11,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=599346.6666666666, ans=0.0 2023-11-19 06:02:17,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=599413.3333333334, ans=0.125 2023-11-19 06:02:17,815 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5750, loss[loss=0.08969, simple_loss=0.1017, pruned_loss=0.02759, audio_tagging_loss=0.01127, over 15773.00 frames. ], tot_loss[loss=0.0909, simple_loss=0.1089, pruned_loss=0.02568, audio_tagging_loss=0.01075, over 3048671.96 frames. ], batch size: 59, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:02:23,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=599413.3333333334, ans=0.0 2023-11-19 06:02:24,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=599413.3333333334, ans=0.125 2023-11-19 06:02:39,362 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-11-19 06:02:43,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=599546.6666666666, ans=0.0 2023-11-19 06:02:55,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=599613.3333333334, ans=0.09899494936611666 2023-11-19 06:02:56,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=599613.3333333334, ans=0.125 2023-11-19 06:02:57,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=599613.3333333334, ans=0.125 2023-11-19 06:03:10,723 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:03:13,150 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5800, loss[loss=0.07133, simple_loss=0.08197, pruned_loss=0.01747, audio_tagging_loss=0.01287, over 16289.00 frames. ], tot_loss[loss=0.09132, simple_loss=0.1097, pruned_loss=0.0258, audio_tagging_loss=0.01066, over 3055448.26 frames. ], batch size: 63, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:03:20,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=599746.6666666666, ans=0.125 2023-11-19 06:03:23,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2023-11-19 06:03:36,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=599880.0, ans=0.125 2023-11-19 06:03:44,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 9.064e+01 9.956e+01 1.074e+02 1.617e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-19 06:03:47,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=599946.6666666666, ans=0.125 2023-11-19 06:03:57,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=600013.3333333334, ans=0.2 2023-11-19 06:04:09,091 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5850, loss[loss=0.1021, simple_loss=0.1261, pruned_loss=0.02851, audio_tagging_loss=0.01052, over 16021.00 frames. ], tot_loss[loss=0.09094, simple_loss=0.1096, pruned_loss=0.02563, audio_tagging_loss=0.01051, over 3060001.29 frames. ], batch size: 59, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:04:11,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=600080.0, ans=0.125 2023-11-19 06:04:32,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-19 06:04:47,126 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2023-11-19 06:04:47,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=600280.0, ans=0.0 2023-11-19 06:04:49,940 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=12.0 2023-11-19 06:05:01,516 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=600346.6666666666, ans=0.035 2023-11-19 06:05:02,884 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2023-11-19 06:05:04,595 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5900, loss[loss=0.08698, simple_loss=0.1076, pruned_loss=0.024, audio_tagging_loss=0.009188, over 14320.00 frames. ], tot_loss[loss=0.09128, simple_loss=0.1104, pruned_loss=0.0257, audio_tagging_loss=0.01037, over 3046352.28 frames. ], batch size: 52, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:05:18,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=600480.0, ans=0.125 2023-11-19 06:05:22,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=600480.0, ans=0.125 2023-11-19 06:05:35,555 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.269e+01 8.982e+01 9.905e+01 1.254e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 06:05:38,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=600613.3333333334, ans=0.125 2023-11-19 06:05:45,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=600613.3333333334, ans=0.1 2023-11-19 06:05:59,420 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 5950, loss[loss=0.08077, simple_loss=0.1044, pruned_loss=0.01977, audio_tagging_loss=0.008819, over 15435.00 frames. ], tot_loss[loss=0.09088, simple_loss=0.1099, pruned_loss=0.02565, audio_tagging_loss=0.01029, over 3050233.70 frames. ], batch size: 57, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:06:22,720 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2023-11-19 06:06:34,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=600946.6666666666, ans=0.0 2023-11-19 06:06:55,536 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6000, loss[loss=0.09045, simple_loss=0.1102, pruned_loss=0.02634, audio_tagging_loss=0.008994, over 15121.00 frames. ], tot_loss[loss=0.09017, simple_loss=0.1087, pruned_loss=0.02542, audio_tagging_loss=0.01038, over 3051294.10 frames. ], batch size: 55, lr: 8.66e-03, grad_scale: 32.0 2023-11-19 06:06:55,537 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 06:07:18,603 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.7869, 3.3445, 2.5942, 2.8640, 3.6748, 3.7316, 2.8937, 3.5759], device='cuda:3') 2023-11-19 06:07:26,589 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.7099, 1.0784, 1.5347, 2.0907, 1.7598, 1.9527, 1.8145, 1.4570], device='cuda:3') 2023-11-19 06:07:28,378 INFO [train_asr.py:1147] (3/4) Epoch 8, validation: loss=0.06748, simple_loss=0.0569, pruned_loss=0.007185, audio_tagging_loss=0.03185, over 4681554.00 frames. 2023-11-19 06:07:28,379 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 06:07:34,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=601080.0, ans=0.2 2023-11-19 06:07:38,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=601146.6666666666, ans=0.125 2023-11-19 06:07:50,396 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=601213.3333333334, ans=0.125 2023-11-19 06:07:57,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=601213.3333333334, ans=0.0 2023-11-19 06:07:59,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.682e+01 9.253e+01 9.954e+01 1.321e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 06:08:07,514 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:08:23,824 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6050, loss[loss=0.1055, simple_loss=0.1285, pruned_loss=0.03309, audio_tagging_loss=0.008151, over 14939.00 frames. ], tot_loss[loss=0.09089, simple_loss=0.1099, pruned_loss=0.02559, audio_tagging_loss=0.01035, over 3050262.75 frames. ], batch size: 56, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:08:38,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=601480.0, ans=0.0 2023-11-19 06:08:53,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=601546.6666666666, ans=0.04949747468305833 2023-11-19 06:08:56,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601613.3333333334, ans=0.1 2023-11-19 06:09:18,655 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6100, loss[loss=0.06898, simple_loss=0.07911, pruned_loss=0.01788, audio_tagging_loss=0.01155, over 13606.00 frames. ], tot_loss[loss=0.09044, simple_loss=0.1092, pruned_loss=0.02539, audio_tagging_loss=0.01044, over 3048010.79 frames. ], batch size: 56, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:09:26,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=601746.6666666666, ans=0.0 2023-11-19 06:09:28,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=601813.3333333334, ans=0.09899494936611666 2023-11-19 06:09:50,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.701e+01 8.581e+01 9.083e+01 1.023e+02 1.492e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 06:10:09,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=602013.3333333334, ans=0.125 2023-11-19 06:10:12,870 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6150, loss[loss=0.09907, simple_loss=0.1189, pruned_loss=0.02674, audio_tagging_loss=0.01289, over 16664.00 frames. ], tot_loss[loss=0.09001, simple_loss=0.1088, pruned_loss=0.02522, audio_tagging_loss=0.01041, over 3049235.57 frames. ], batch size: 61, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:10:13,584 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.36 vs. limit=22.5 2023-11-19 06:10:35,400 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:11:08,563 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6200, loss[loss=0.1154, simple_loss=0.1285, pruned_loss=0.03887, audio_tagging_loss=0.01225, over 15398.00 frames. ], tot_loss[loss=0.09065, simple_loss=0.1089, pruned_loss=0.02561, audio_tagging_loss=0.01059, over 3049937.02 frames. ], batch size: 58, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:11:09,845 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602413.3333333334, ans=0.1 2023-11-19 06:11:13,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=602413.3333333334, ans=0.0 2023-11-19 06:11:18,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602480.0, ans=0.1 2023-11-19 06:11:22,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=602480.0, ans=0.0 2023-11-19 06:11:23,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=602480.0, ans=0.125 2023-11-19 06:11:28,429 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=602480.0, ans=0.125 2023-11-19 06:11:33,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=602546.6666666666, ans=0.02 2023-11-19 06:11:39,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.400e+01 8.989e+01 9.962e+01 1.274e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 06:11:41,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2023-11-19 06:12:03,528 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6250, loss[loss=0.08935, simple_loss=0.1109, pruned_loss=0.02256, audio_tagging_loss=0.01133, over 15801.00 frames. ], tot_loss[loss=0.09019, simple_loss=0.1081, pruned_loss=0.02543, audio_tagging_loss=0.01072, over 3047793.12 frames. ], batch size: 58, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:12:18,551 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=602813.3333333334, ans=0.125 2023-11-19 06:12:28,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=602880.0, ans=0.125 2023-11-19 06:12:58,167 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6300, loss[loss=0.09298, simple_loss=0.1147, pruned_loss=0.02564, audio_tagging_loss=0.009994, over 16650.00 frames. ], tot_loss[loss=0.09045, simple_loss=0.1082, pruned_loss=0.02553, audio_tagging_loss=0.01083, over 3045896.40 frames. ], batch size: 60, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:12:58,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-19 06:13:07,909 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=603146.6666666666, ans=0.2 2023-11-19 06:13:16,773 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.72 vs. limit=22.5 2023-11-19 06:13:22,735 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:13:30,519 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 8.462e+01 9.271e+01 1.035e+02 1.313e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-19 06:13:35,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=603280.0, ans=0.125 2023-11-19 06:13:52,814 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6350, loss[loss=0.07433, simple_loss=0.08396, pruned_loss=0.02043, audio_tagging_loss=0.01192, over 14776.00 frames. ], tot_loss[loss=0.08956, simple_loss=0.107, pruned_loss=0.02516, audio_tagging_loss=0.01088, over 3050339.19 frames. ], batch size: 56, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:14:03,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=603480.0, ans=0.0 2023-11-19 06:14:10,904 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2023-11-19 06:14:12,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603480.0, ans=0.1 2023-11-19 06:14:26,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=603613.3333333334, ans=0.0 2023-11-19 06:14:36,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=603680.0, ans=0.0 2023-11-19 06:14:36,676 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.14 vs. limit=15.0 2023-11-19 06:14:48,828 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6400, loss[loss=0.08859, simple_loss=0.1051, pruned_loss=0.02424, audio_tagging_loss=0.01178, over 16215.00 frames. ], tot_loss[loss=0.08966, simple_loss=0.1068, pruned_loss=0.02529, audio_tagging_loss=0.01098, over 3050693.41 frames. ], batch size: 60, lr: 8.65e-03, grad_scale: 32.0 2023-11-19 06:15:03,365 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.063e-01 2023-11-19 06:15:06,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=603813.3333333334, ans=0.125 2023-11-19 06:15:12,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=603880.0, ans=0.125 2023-11-19 06:15:20,943 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.708e+01 9.476e+01 1.030e+02 1.332e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-19 06:15:44,368 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6450, loss[loss=0.07456, simple_loss=0.08626, pruned_loss=0.01863, audio_tagging_loss=0.0128, over 14794.00 frames. ], tot_loss[loss=0.08982, simple_loss=0.107, pruned_loss=0.02521, audio_tagging_loss=0.01112, over 3044858.52 frames. ], batch size: 57, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:15:49,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=604080.0, ans=0.125 2023-11-19 06:15:51,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=604080.0, ans=0.04949747468305833 2023-11-19 06:15:53,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.68 vs. limit=15.0 2023-11-19 06:16:00,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=604146.6666666666, ans=0.0 2023-11-19 06:16:00,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=604146.6666666666, ans=0.125 2023-11-19 06:16:32,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=604346.6666666666, ans=0.0 2023-11-19 06:16:32,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=604346.6666666666, ans=0.125 2023-11-19 06:16:39,329 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6500, loss[loss=0.08366, simple_loss=0.09929, pruned_loss=0.02353, audio_tagging_loss=0.01048, over 15421.00 frames. ], tot_loss[loss=0.08976, simple_loss=0.1068, pruned_loss=0.02521, audio_tagging_loss=0.01115, over 3045879.38 frames. ], batch size: 57, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:16:44,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=604413.3333333334, ans=0.0 2023-11-19 06:16:58,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=604480.0, ans=0.0 2023-11-19 06:17:02,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=604546.6666666666, ans=0.125 2023-11-19 06:17:11,948 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 8.554e+01 9.296e+01 1.013e+02 1.424e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 06:17:27,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=604680.0, ans=0.2 2023-11-19 06:17:35,738 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6550, loss[loss=0.07771, simple_loss=0.09132, pruned_loss=0.0228, audio_tagging_loss=0.009241, over 15713.00 frames. ], tot_loss[loss=0.09061, simple_loss=0.108, pruned_loss=0.02568, audio_tagging_loss=0.01093, over 3046463.41 frames. ], batch size: 60, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:17:47,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=604813.3333333334, ans=0.0 2023-11-19 06:17:48,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.62 vs. limit=10.0 2023-11-19 06:17:59,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=604880.0, ans=0.07 2023-11-19 06:18:00,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=604880.0, ans=0.0 2023-11-19 06:18:14,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604946.6666666666, ans=0.1 2023-11-19 06:18:21,950 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=605013.3333333334, ans=0.125 2023-11-19 06:18:24,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=605013.3333333334, ans=0.125 2023-11-19 06:18:30,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.72 vs. limit=22.5 2023-11-19 06:18:31,326 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6600, loss[loss=0.09111, simple_loss=0.1057, pruned_loss=0.02729, audio_tagging_loss=0.01096, over 15574.00 frames. ], tot_loss[loss=0.09066, simple_loss=0.1081, pruned_loss=0.02569, audio_tagging_loss=0.0109, over 3048077.11 frames. ], batch size: 56, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:18:31,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=605080.0, ans=0.125 2023-11-19 06:18:35,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605080.0, ans=0.1 2023-11-19 06:19:03,388 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2023-11-19 06:19:03,867 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.408e+01 8.547e+01 9.371e+01 1.021e+02 1.350e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 06:19:22,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=605346.6666666666, ans=0.125 2023-11-19 06:19:24,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=605346.6666666666, ans=0.125 2023-11-19 06:19:26,466 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6650, loss[loss=0.08622, simple_loss=0.1078, pruned_loss=0.02206, audio_tagging_loss=0.01025, over 15859.00 frames. ], tot_loss[loss=0.09003, simple_loss=0.1073, pruned_loss=0.02553, audio_tagging_loss=0.01084, over 3044931.80 frames. ], batch size: 57, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:19:40,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=605480.0, ans=0.04949747468305833 2023-11-19 06:19:45,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=605480.0, ans=0.0 2023-11-19 06:20:04,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=605613.3333333334, ans=0.09899494936611666 2023-11-19 06:20:12,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=605680.0, ans=0.0 2023-11-19 06:20:22,600 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6700, loss[loss=0.09589, simple_loss=0.1153, pruned_loss=0.02693, audio_tagging_loss=0.01133, over 15523.00 frames. ], tot_loss[loss=0.09081, simple_loss=0.1085, pruned_loss=0.02577, audio_tagging_loss=0.01078, over 3042309.29 frames. ], batch size: 56, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:20:23,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=605746.6666666666, ans=0.2 2023-11-19 06:20:29,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=605746.6666666666, ans=0.125 2023-11-19 06:20:49,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=605880.0, ans=0.125 2023-11-19 06:20:51,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=605880.0, ans=0.015 2023-11-19 06:20:53,952 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.187e+01 8.900e+01 9.907e+01 1.762e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 06:21:12,106 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2023-11-19 06:21:18,303 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6750, loss[loss=0.08848, simple_loss=0.1014, pruned_loss=0.02678, audio_tagging_loss=0.01102, over 14986.00 frames. ], tot_loss[loss=0.09122, simple_loss=0.1094, pruned_loss=0.026, audio_tagging_loss=0.01054, over 3039959.02 frames. ], batch size: 58, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:21:26,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.86 vs. limit=22.5 2023-11-19 06:21:43,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=606213.3333333334, ans=0.125 2023-11-19 06:21:50,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=606280.0, ans=0.125 2023-11-19 06:21:52,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=606280.0, ans=0.125 2023-11-19 06:21:56,731 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.34 vs. limit=15.0 2023-11-19 06:21:59,466 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=15.0 2023-11-19 06:22:13,366 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6800, loss[loss=0.07549, simple_loss=0.09907, pruned_loss=0.01646, audio_tagging_loss=0.009492, over 15848.00 frames. ], tot_loss[loss=0.09113, simple_loss=0.1094, pruned_loss=0.0259, audio_tagging_loss=0.01051, over 3043082.71 frames. ], batch size: 60, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:22:26,690 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.34 vs. limit=12.0 2023-11-19 06:22:31,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=606480.0, ans=0.0 2023-11-19 06:22:43,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=606546.6666666666, ans=0.0 2023-11-19 06:22:45,965 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.152e+01 8.455e+01 9.265e+01 1.067e+02 1.623e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-19 06:22:54,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606613.3333333334, ans=0.1 2023-11-19 06:23:08,797 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.60 vs. limit=22.5 2023-11-19 06:23:08,822 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2023-11-19 06:23:09,235 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6850, loss[loss=0.08632, simple_loss=0.11, pruned_loss=0.02271, audio_tagging_loss=0.008607, over 15257.00 frames. ], tot_loss[loss=0.09153, simple_loss=0.1104, pruned_loss=0.02596, audio_tagging_loss=0.01034, over 3043750.74 frames. ], batch size: 56, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:23:41,276 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=606946.6666666666, ans=0.0 2023-11-19 06:23:53,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=607013.3333333334, ans=0.0 2023-11-19 06:24:04,760 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6900, loss[loss=0.05511, simple_loss=0.06127, pruned_loss=0.01292, audio_tagging_loss=0.01155, over 14370.00 frames. ], tot_loss[loss=0.09074, simple_loss=0.1096, pruned_loss=0.02556, audio_tagging_loss=0.01039, over 3042723.35 frames. ], batch size: 55, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:24:11,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=607080.0, ans=0.125 2023-11-19 06:24:18,622 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.33 vs. limit=22.5 2023-11-19 06:24:31,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607213.3333333334, ans=0.1 2023-11-19 06:24:34,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2023-11-19 06:24:37,156 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.334e+01 9.172e+01 9.913e+01 1.941e+02, threshold=1.834e+02, percent-clipped=1.0 2023-11-19 06:24:42,197 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=607280.0, ans=0.0 2023-11-19 06:24:48,205 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:24:52,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=607346.6666666666, ans=0.125 2023-11-19 06:24:59,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=607413.3333333334, ans=0.125 2023-11-19 06:25:00,439 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 6950, loss[loss=0.1072, simple_loss=0.136, pruned_loss=0.02987, audio_tagging_loss=0.009342, over 16292.00 frames. ], tot_loss[loss=0.09033, simple_loss=0.1091, pruned_loss=0.02528, audio_tagging_loss=0.01052, over 3050715.45 frames. ], batch size: 57, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:25:09,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=607413.3333333334, ans=0.0 2023-11-19 06:25:22,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=607546.6666666666, ans=0.0 2023-11-19 06:25:49,465 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=607680.0, ans=0.125 2023-11-19 06:25:56,175 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2023-11-19 06:25:56,729 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7000, loss[loss=0.0901, simple_loss=0.1082, pruned_loss=0.02455, audio_tagging_loss=0.01143, over 14710.00 frames. ], tot_loss[loss=0.08998, simple_loss=0.1086, pruned_loss=0.02516, audio_tagging_loss=0.01051, over 3048037.39 frames. ], batch size: 55, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:26:13,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=607813.3333333334, ans=0.0 2023-11-19 06:26:14,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=607813.3333333334, ans=0.125 2023-11-19 06:26:15,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=607813.3333333334, ans=0.125 2023-11-19 06:26:25,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=607880.0, ans=0.125 2023-11-19 06:26:28,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.487e+01 9.225e+01 1.011e+02 1.458e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 06:26:28,873 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-11-19 06:26:29,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=607946.6666666666, ans=0.5 2023-11-19 06:26:48,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=608013.3333333334, ans=0.125 2023-11-19 06:26:52,397 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7050, loss[loss=0.1065, simple_loss=0.1237, pruned_loss=0.03076, audio_tagging_loss=0.01386, over 16458.00 frames. ], tot_loss[loss=0.09124, simple_loss=0.11, pruned_loss=0.02575, audio_tagging_loss=0.01051, over 3047834.00 frames. ], batch size: 62, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:27:01,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608080.0, ans=0.1 2023-11-19 06:27:15,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=608213.3333333334, ans=0.125 2023-11-19 06:27:22,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608213.3333333334, ans=0.1 2023-11-19 06:27:35,173 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=608280.0, ans=0.125 2023-11-19 06:27:48,155 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7100, loss[loss=0.1128, simple_loss=0.1439, pruned_loss=0.03264, audio_tagging_loss=0.008188, over 14140.00 frames. ], tot_loss[loss=0.09167, simple_loss=0.1098, pruned_loss=0.02598, audio_tagging_loss=0.01077, over 3043918.72 frames. ], batch size: 53, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:28:19,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 9.012e+01 9.917e+01 1.109e+02 1.355e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-19 06:28:31,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=608680.0, ans=0.125 2023-11-19 06:28:43,567 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7150, loss[loss=0.0892, simple_loss=0.1004, pruned_loss=0.02715, audio_tagging_loss=0.01185, over 13883.00 frames. ], tot_loss[loss=0.0918, simple_loss=0.1101, pruned_loss=0.02595, audio_tagging_loss=0.01082, over 3038193.28 frames. ], batch size: 52, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:29:10,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2023-11-19 06:29:13,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=608880.0, ans=0.0 2023-11-19 06:29:22,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=608946.6666666666, ans=0.125 2023-11-19 06:29:38,978 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7200, loss[loss=0.05115, simple_loss=0.05393, pruned_loss=0.01026, audio_tagging_loss=0.01392, over 14116.00 frames. ], tot_loss[loss=0.09086, simple_loss=0.1088, pruned_loss=0.02554, audio_tagging_loss=0.01092, over 3028269.64 frames. ], batch size: 57, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:29:44,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=609080.0, ans=0.125 2023-11-19 06:29:52,160 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2023-11-19 06:30:10,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 8.596e+01 9.352e+01 1.022e+02 1.385e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 06:30:14,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=609280.0, ans=0.125 2023-11-19 06:30:21,954 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=609346.6666666666, ans=0.0 2023-11-19 06:30:21,998 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=609346.6666666666, ans=0.125 2023-11-19 06:30:26,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=609346.6666666666, ans=0.125 2023-11-19 06:30:33,342 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7250, loss[loss=0.09403, simple_loss=0.1203, pruned_loss=0.02811, audio_tagging_loss=0.005752, over 15982.00 frames. ], tot_loss[loss=0.0911, simple_loss=0.1094, pruned_loss=0.02555, audio_tagging_loss=0.01088, over 3032053.31 frames. ], batch size: 57, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:30:45,863 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:31:03,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=609546.6666666666, ans=0.09899494936611666 2023-11-19 06:31:08,307 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-11-19 06:31:14,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.32 vs. limit=15.0 2023-11-19 06:31:19,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=609680.0, ans=0.2 2023-11-19 06:31:28,339 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7300, loss[loss=0.08782, simple_loss=0.1008, pruned_loss=0.02647, audio_tagging_loss=0.01094, over 14940.00 frames. ], tot_loss[loss=0.09047, simple_loss=0.1082, pruned_loss=0.02549, audio_tagging_loss=0.01088, over 3036805.76 frames. ], batch size: 58, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:31:40,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=609813.3333333334, ans=0.125 2023-11-19 06:31:59,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-11-19 06:31:59,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.576e+01 9.670e+01 1.044e+02 1.433e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 06:32:05,874 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=609946.6666666666, ans=0.0 2023-11-19 06:32:07,204 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.44 vs. limit=15.0 2023-11-19 06:32:19,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610013.3333333334, ans=0.1 2023-11-19 06:32:23,124 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7350, loss[loss=0.07458, simple_loss=0.08852, pruned_loss=0.02089, audio_tagging_loss=0.009428, over 15898.00 frames. ], tot_loss[loss=0.09087, simple_loss=0.1088, pruned_loss=0.02575, audio_tagging_loss=0.01071, over 3042244.26 frames. ], batch size: 60, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:32:38,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=610146.6666666666, ans=0.125 2023-11-19 06:32:56,475 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2023-11-19 06:33:00,320 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:33:14,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.19 vs. limit=15.0 2023-11-19 06:33:15,631 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=610346.6666666666, ans=0.125 2023-11-19 06:33:18,540 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7400, loss[loss=0.115, simple_loss=0.1481, pruned_loss=0.03242, audio_tagging_loss=0.008508, over 16093.00 frames. ], tot_loss[loss=0.09034, simple_loss=0.1088, pruned_loss=0.02546, audio_tagging_loss=0.01048, over 3041960.12 frames. ], batch size: 58, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:33:26,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=610413.3333333334, ans=0.0 2023-11-19 06:33:42,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=610546.6666666666, ans=0.125 2023-11-19 06:33:51,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.574e+01 9.523e+01 1.112e+02 1.475e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 06:33:53,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=610613.3333333334, ans=0.125 2023-11-19 06:34:01,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=610613.3333333334, ans=0.1 2023-11-19 06:34:04,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=610680.0, ans=0.025 2023-11-19 06:34:04,740 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-19 06:34:14,353 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7450, loss[loss=0.07647, simple_loss=0.09652, pruned_loss=0.01925, audio_tagging_loss=0.008965, over 13928.00 frames. ], tot_loss[loss=0.09024, simple_loss=0.1087, pruned_loss=0.02546, audio_tagging_loss=0.01043, over 3040488.27 frames. ], batch size: 55, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:34:15,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=610746.6666666666, ans=0.025 2023-11-19 06:34:23,587 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2023-11-19 06:34:35,424 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-11-19 06:34:38,416 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=610880.0, ans=0.125 2023-11-19 06:34:39,477 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=610880.0, ans=0.0 2023-11-19 06:34:41,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=610880.0, ans=0.125 2023-11-19 06:35:05,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=611013.3333333334, ans=0.0 2023-11-19 06:35:07,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=611013.3333333334, ans=0.0 2023-11-19 06:35:10,292 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7500, loss[loss=0.09249, simple_loss=0.1061, pruned_loss=0.0279, audio_tagging_loss=0.01153, over 15127.00 frames. ], tot_loss[loss=0.09098, simple_loss=0.1099, pruned_loss=0.02568, audio_tagging_loss=0.01033, over 3044840.98 frames. ], batch size: 57, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:35:13,613 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2023-11-19 06:35:19,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=611080.0, ans=0.125 2023-11-19 06:35:20,057 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.07 vs. limit=10.0 2023-11-19 06:35:21,755 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=611146.6666666666, ans=0.05 2023-11-19 06:35:33,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=611213.3333333334, ans=0.125 2023-11-19 06:35:36,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611213.3333333334, ans=0.1 2023-11-19 06:35:39,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2023-11-19 06:35:42,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.409e+01 9.431e+01 1.041e+02 1.502e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 06:36:05,195 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7550, loss[loss=0.09787, simple_loss=0.1187, pruned_loss=0.02905, audio_tagging_loss=0.009473, over 16333.00 frames. ], tot_loss[loss=0.09104, simple_loss=0.1097, pruned_loss=0.02582, audio_tagging_loss=0.01037, over 3042079.75 frames. ], batch size: 61, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:36:05,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=611413.3333333334, ans=0.1 2023-11-19 06:36:06,494 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=611413.3333333334, ans=0.125 2023-11-19 06:36:07,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.37 vs. limit=5.0 2023-11-19 06:36:12,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=611413.3333333334, ans=0.125 2023-11-19 06:36:35,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.44 vs. limit=15.0 2023-11-19 06:36:38,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=611613.3333333334, ans=0.0 2023-11-19 06:36:47,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=611613.3333333334, ans=0.0 2023-11-19 06:36:51,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=611680.0, ans=0.125 2023-11-19 06:36:56,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2023-11-19 06:36:59,384 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7600, loss[loss=0.09454, simple_loss=0.121, pruned_loss=0.02307, audio_tagging_loss=0.01098, over 15454.00 frames. ], tot_loss[loss=0.0897, simple_loss=0.1076, pruned_loss=0.02537, audio_tagging_loss=0.01055, over 3045108.97 frames. ], batch size: 56, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:37:04,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=611746.6666666666, ans=0.05 2023-11-19 06:37:20,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=611813.3333333334, ans=0.2 2023-11-19 06:37:32,017 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.282e+01 9.217e+01 9.907e+01 1.227e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 06:37:42,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=611946.6666666666, ans=0.0 2023-11-19 06:37:44,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=612013.3333333334, ans=0.125 2023-11-19 06:37:52,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=612013.3333333334, ans=0.0 2023-11-19 06:37:56,182 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7650, loss[loss=0.06283, simple_loss=0.07532, pruned_loss=0.01473, audio_tagging_loss=0.01044, over 15937.00 frames. ], tot_loss[loss=0.08984, simple_loss=0.1079, pruned_loss=0.02539, audio_tagging_loss=0.01049, over 3040952.59 frames. ], batch size: 64, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:38:01,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=612080.0, ans=0.125 2023-11-19 06:38:06,314 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:38:20,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=612213.3333333334, ans=0.0 2023-11-19 06:38:46,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=612346.6666666666, ans=0.125 2023-11-19 06:38:48,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=612346.6666666666, ans=0.05 2023-11-19 06:38:51,529 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7700, loss[loss=0.0598, simple_loss=0.05904, pruned_loss=0.01335, audio_tagging_loss=0.01693, over 16522.00 frames. ], tot_loss[loss=0.08985, simple_loss=0.108, pruned_loss=0.02537, audio_tagging_loss=0.01048, over 3038008.87 frames. ], batch size: 65, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:39:23,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.913e+01 9.609e+01 1.128e+02 1.741e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 06:39:30,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.33 vs. limit=6.0 2023-11-19 06:39:35,156 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-19 06:39:46,133 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7750, loss[loss=0.09326, simple_loss=0.1129, pruned_loss=0.02273, audio_tagging_loss=0.0141, over 14848.00 frames. ], tot_loss[loss=0.09055, simple_loss=0.109, pruned_loss=0.0255, audio_tagging_loss=0.01053, over 3035373.32 frames. ], batch size: 56, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:39:49,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.41 vs. limit=15.0 2023-11-19 06:40:11,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=22.5 2023-11-19 06:40:15,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=612880.0, ans=0.0 2023-11-19 06:40:17,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=612880.0, ans=0.125 2023-11-19 06:40:31,760 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=613013.3333333334, ans=0.125 2023-11-19 06:40:42,169 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7800, loss[loss=0.08915, simple_loss=0.1041, pruned_loss=0.02423, audio_tagging_loss=0.01284, over 15512.00 frames. ], tot_loss[loss=0.09141, simple_loss=0.1103, pruned_loss=0.02576, audio_tagging_loss=0.01053, over 3039358.63 frames. ], batch size: 57, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:40:50,219 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.87 vs. limit=15.0 2023-11-19 06:41:13,882 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.392e+01 9.222e+01 1.047e+02 1.457e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 06:41:36,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2023-11-19 06:41:40,133 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=15.0 2023-11-19 06:41:40,455 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7850, loss[loss=0.09905, simple_loss=0.1215, pruned_loss=0.0288, audio_tagging_loss=0.009503, over 14889.00 frames. ], tot_loss[loss=0.09149, simple_loss=0.1102, pruned_loss=0.02582, audio_tagging_loss=0.01059, over 3044921.00 frames. ], batch size: 56, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:41:44,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=613413.3333333334, ans=15.0 2023-11-19 06:41:49,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=613413.3333333334, ans=0.2 2023-11-19 06:42:09,552 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.40 vs. limit=22.5 2023-11-19 06:42:35,050 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7900, loss[loss=0.1027, simple_loss=0.1221, pruned_loss=0.02998, audio_tagging_loss=0.01165, over 16074.00 frames. ], tot_loss[loss=0.0911, simple_loss=0.1093, pruned_loss=0.02569, audio_tagging_loss=0.01076, over 3046906.63 frames. ], batch size: 60, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:42:55,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=613813.3333333334, ans=12.0 2023-11-19 06:43:07,836 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.644e+01 9.372e+01 1.085e+02 1.414e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 06:43:09,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=613946.6666666666, ans=0.2 2023-11-19 06:43:10,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=613946.6666666666, ans=0.05 2023-11-19 06:43:12,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=613946.6666666666, ans=0.0 2023-11-19 06:43:15,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=613946.6666666666, ans=0.125 2023-11-19 06:43:21,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614013.3333333334, ans=0.0 2023-11-19 06:43:28,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=614013.3333333334, ans=0.125 2023-11-19 06:43:31,071 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 7950, loss[loss=0.0827, simple_loss=0.09758, pruned_loss=0.02247, audio_tagging_loss=0.01144, over 14925.00 frames. ], tot_loss[loss=0.09048, simple_loss=0.1084, pruned_loss=0.02542, audio_tagging_loss=0.01085, over 3043899.34 frames. ], batch size: 56, lr: 8.57e-03, grad_scale: 32.0 2023-11-19 06:43:33,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=614080.0, ans=0.125 2023-11-19 06:43:40,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=614080.0, ans=0.2 2023-11-19 06:43:44,724 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:43:48,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=614146.6666666666, ans=0.125 2023-11-19 06:43:54,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=614213.3333333334, ans=0.2 2023-11-19 06:44:05,876 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=614280.0, ans=0.125 2023-11-19 06:44:11,508 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2023-11-19 06:44:15,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=614346.6666666666, ans=0.0 2023-11-19 06:44:20,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=614346.6666666666, ans=0.125 2023-11-19 06:44:23,981 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2023-11-19 06:44:26,292 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8000, loss[loss=0.09311, simple_loss=0.1138, pruned_loss=0.02617, audio_tagging_loss=0.01003, over 15715.00 frames. ], tot_loss[loss=0.09044, simple_loss=0.1084, pruned_loss=0.02544, audio_tagging_loss=0.01081, over 3041293.33 frames. ], batch size: 57, lr: 8.57e-03, grad_scale: 32.0 2023-11-19 06:44:39,623 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-19 06:44:42,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=22.5 2023-11-19 06:44:51,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2023-11-19 06:44:57,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.896e+01 9.629e+01 1.081e+02 1.400e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-19 06:45:02,121 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2023-11-19 06:45:06,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=614613.3333333334, ans=0.125 2023-11-19 06:45:11,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=614680.0, ans=0.0 2023-11-19 06:45:16,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614680.0, ans=0.1 2023-11-19 06:45:17,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=614680.0, ans=0.125 2023-11-19 06:45:18,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=614680.0, ans=0.125 2023-11-19 06:45:21,587 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8050, loss[loss=0.1194, simple_loss=0.1427, pruned_loss=0.03939, audio_tagging_loss=0.008652, over 15717.00 frames. ], tot_loss[loss=0.09039, simple_loss=0.1079, pruned_loss=0.02553, audio_tagging_loss=0.01089, over 3033335.24 frames. ], batch size: 56, lr: 8.57e-03, grad_scale: 64.0 2023-11-19 06:45:22,378 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.73 vs. limit=15.0 2023-11-19 06:45:25,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=614746.6666666666, ans=0.125 2023-11-19 06:46:17,957 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8100, loss[loss=0.09772, simple_loss=0.1193, pruned_loss=0.02753, audio_tagging_loss=0.01057, over 14634.00 frames. ], tot_loss[loss=0.09018, simple_loss=0.1078, pruned_loss=0.02544, audio_tagging_loss=0.01084, over 3031644.60 frames. ], batch size: 53, lr: 8.57e-03, grad_scale: 64.0 2023-11-19 06:46:40,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.22 vs. limit=10.0 2023-11-19 06:46:48,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=615213.3333333334, ans=0.125 2023-11-19 06:46:49,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.425e+01 9.284e+01 1.022e+02 1.413e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 06:46:56,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=615280.0, ans=0.125 2023-11-19 06:46:59,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=615280.0, ans=0.04949747468305833 2023-11-19 06:47:13,502 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8150, loss[loss=0.08657, simple_loss=0.1171, pruned_loss=0.01942, audio_tagging_loss=0.008606, over 14701.00 frames. ], tot_loss[loss=0.08966, simple_loss=0.1077, pruned_loss=0.02518, audio_tagging_loss=0.01065, over 3036387.81 frames. ], batch size: 53, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:47:20,693 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.35 vs. limit=15.0 2023-11-19 06:47:38,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=615546.6666666666, ans=0.125 2023-11-19 06:47:43,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2023-11-19 06:47:58,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=615680.0, ans=0.125 2023-11-19 06:47:59,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.91 vs. limit=10.0 2023-11-19 06:48:08,938 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8200, loss[loss=0.09996, simple_loss=0.1094, pruned_loss=0.03428, audio_tagging_loss=0.011, over 15928.00 frames. ], tot_loss[loss=0.08973, simple_loss=0.1079, pruned_loss=0.02519, audio_tagging_loss=0.01059, over 3036720.54 frames. ], batch size: 59, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:48:08,981 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:48:09,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=615746.6666666666, ans=0.0 2023-11-19 06:48:16,633 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=615746.6666666666, ans=0.125 2023-11-19 06:48:21,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=615813.3333333334, ans=0.02 2023-11-19 06:48:22,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=615813.3333333334, ans=0.125 2023-11-19 06:48:27,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=615813.3333333334, ans=0.0 2023-11-19 06:48:32,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=615880.0, ans=0.125 2023-11-19 06:48:34,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=615880.0, ans=0.05 2023-11-19 06:48:39,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=615880.0, ans=0.125 2023-11-19 06:48:41,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.380e+01 9.339e+01 1.044e+02 1.327e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 06:48:54,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=616013.3333333334, ans=0.0 2023-11-19 06:49:05,243 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8250, loss[loss=0.09472, simple_loss=0.1158, pruned_loss=0.0302, audio_tagging_loss=0.006623, over 14265.00 frames. ], tot_loss[loss=0.08993, simple_loss=0.1085, pruned_loss=0.02533, audio_tagging_loss=0.01033, over 3043392.09 frames. ], batch size: 55, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:49:14,310 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.41 vs. limit=22.5 2023-11-19 06:49:20,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-11-19 06:49:48,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=616346.6666666666, ans=0.0 2023-11-19 06:49:49,377 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.55 vs. limit=15.0 2023-11-19 06:50:00,758 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8300, loss[loss=0.07387, simple_loss=0.08696, pruned_loss=0.02123, audio_tagging_loss=0.009157, over 16292.00 frames. ], tot_loss[loss=0.08891, simple_loss=0.1071, pruned_loss=0.02499, audio_tagging_loss=0.01036, over 3039406.54 frames. ], batch size: 63, lr: 8.56e-03, grad_scale: 32.0 2023-11-19 06:50:02,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=616413.3333333334, ans=0.015 2023-11-19 06:50:05,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=616413.3333333334, ans=0.2 2023-11-19 06:50:34,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.747e+01 9.487e+01 1.050e+02 1.506e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-19 06:50:56,411 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8350, loss[loss=0.0912, simple_loss=0.108, pruned_loss=0.02659, audio_tagging_loss=0.01058, over 15969.00 frames. ], tot_loss[loss=0.08909, simple_loss=0.1074, pruned_loss=0.02504, audio_tagging_loss=0.01036, over 3034848.55 frames. ], batch size: 61, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:51:05,188 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=616746.6666666666, ans=0.125 2023-11-19 06:51:13,538 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=616813.3333333334, ans=0.0 2023-11-19 06:51:17,784 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=616880.0, ans=0.125 2023-11-19 06:51:30,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=616946.6666666666, ans=0.125 2023-11-19 06:51:31,979 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=616946.6666666666, ans=0.2 2023-11-19 06:51:51,343 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8400, loss[loss=0.07311, simple_loss=0.0795, pruned_loss=0.02316, audio_tagging_loss=0.0102, over 16133.00 frames. ], tot_loss[loss=0.08922, simple_loss=0.1077, pruned_loss=0.02511, audio_tagging_loss=0.01026, over 3040355.90 frames. ], batch size: 62, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:52:22,360 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=12.0 2023-11-19 06:52:25,115 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.578e+01 9.429e+01 1.025e+02 1.342e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 06:52:47,680 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8450, loss[loss=0.1037, simple_loss=0.1149, pruned_loss=0.03757, audio_tagging_loss=0.008651, over 14945.00 frames. ], tot_loss[loss=0.08854, simple_loss=0.1064, pruned_loss=0.02488, audio_tagging_loss=0.01047, over 3038825.89 frames. ], batch size: 57, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:53:06,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-19 06:53:13,258 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=617546.6666666666, ans=0.2 2023-11-19 06:53:13,491 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=22.5 2023-11-19 06:53:14,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=617546.6666666666, ans=0.125 2023-11-19 06:53:31,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=617680.0, ans=0.05 2023-11-19 06:53:43,102 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8500, loss[loss=0.1034, simple_loss=0.1244, pruned_loss=0.03112, audio_tagging_loss=0.01012, over 13808.00 frames. ], tot_loss[loss=0.08911, simple_loss=0.107, pruned_loss=0.02503, audio_tagging_loss=0.01057, over 3043986.39 frames. ], batch size: 53, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:53:48,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=617746.6666666666, ans=0.125 2023-11-19 06:53:50,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=617746.6666666666, ans=0.125 2023-11-19 06:53:53,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=617813.3333333334, ans=0.125 2023-11-19 06:54:00,180 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=617813.3333333334, ans=0.0 2023-11-19 06:54:08,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=617880.0, ans=0.2 2023-11-19 06:54:16,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.808e+01 8.976e+01 1.039e+02 1.170e+02 1.800e+02, threshold=2.077e+02, percent-clipped=0.0 2023-11-19 06:54:20,254 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:54:22,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=617946.6666666666, ans=0.125 2023-11-19 06:54:26,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=618013.3333333334, ans=0.2 2023-11-19 06:54:38,528 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8550, loss[loss=0.06402, simple_loss=0.07287, pruned_loss=0.01667, audio_tagging_loss=0.01091, over 13891.00 frames. ], tot_loss[loss=0.08882, simple_loss=0.1065, pruned_loss=0.02492, audio_tagging_loss=0.01066, over 3040685.00 frames. ], batch size: 54, lr: 8.55e-03, grad_scale: 16.0 2023-11-19 06:54:43,996 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=618080.0, ans=0.125 2023-11-19 06:55:34,196 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8600, loss[loss=0.08427, simple_loss=0.1061, pruned_loss=0.02073, audio_tagging_loss=0.01047, over 15109.00 frames. ], tot_loss[loss=0.08896, simple_loss=0.1064, pruned_loss=0.02501, audio_tagging_loss=0.01074, over 3035345.92 frames. ], batch size: 55, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:55:38,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=618413.3333333334, ans=0.1 2023-11-19 06:55:46,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=618480.0, ans=0.125 2023-11-19 06:56:06,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=618613.3333333334, ans=0.2 2023-11-19 06:56:09,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.491e+01 8.546e+01 9.397e+01 1.054e+02 1.390e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 06:56:14,018 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2023-11-19 06:56:14,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=618613.3333333334, ans=0.0 2023-11-19 06:56:17,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=618613.3333333334, ans=0.0 2023-11-19 06:56:19,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=618680.0, ans=0.125 2023-11-19 06:56:29,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=618746.6666666666, ans=0.5 2023-11-19 06:56:30,045 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8650, loss[loss=0.07184, simple_loss=0.08195, pruned_loss=0.02095, audio_tagging_loss=0.009915, over 15336.00 frames. ], tot_loss[loss=0.08926, simple_loss=0.1072, pruned_loss=0.02495, audio_tagging_loss=0.01072, over 3032885.70 frames. ], batch size: 59, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:56:30,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=618746.6666666666, ans=0.125 2023-11-19 06:56:56,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618880.0, ans=0.1 2023-11-19 06:57:24,839 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8700, loss[loss=0.0749, simple_loss=0.0936, pruned_loss=0.01787, audio_tagging_loss=0.01024, over 14922.00 frames. ], tot_loss[loss=0.08959, simple_loss=0.1075, pruned_loss=0.02508, audio_tagging_loss=0.01079, over 3033668.21 frames. ], batch size: 56, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:57:25,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=619080.0, ans=0.0 2023-11-19 06:57:28,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=619080.0, ans=0.125 2023-11-19 06:57:47,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=619213.3333333334, ans=0.1 2023-11-19 06:57:48,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=619213.3333333334, ans=0.125 2023-11-19 06:58:00,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 9.176e+01 9.937e+01 1.111e+02 1.947e+02, threshold=1.987e+02, percent-clipped=1.0 2023-11-19 06:58:17,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=619346.6666666666, ans=0.1 2023-11-19 06:58:20,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=619413.3333333334, ans=0.2 2023-11-19 06:58:21,829 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8750, loss[loss=0.09475, simple_loss=0.1097, pruned_loss=0.02889, audio_tagging_loss=0.011, over 15170.00 frames. ], tot_loss[loss=0.09058, simple_loss=0.1085, pruned_loss=0.02554, audio_tagging_loss=0.01077, over 3029952.11 frames. ], batch size: 57, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:58:37,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=619480.0, ans=0.125 2023-11-19 06:58:52,276 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.22 vs. limit=10.0 2023-11-19 06:59:11,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=619680.0, ans=0.125 2023-11-19 06:59:16,511 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8800, loss[loss=0.07869, simple_loss=0.0954, pruned_loss=0.0198, audio_tagging_loss=0.01119, over 14670.00 frames. ], tot_loss[loss=0.09097, simple_loss=0.109, pruned_loss=0.02555, audio_tagging_loss=0.01092, over 3032480.93 frames. ], batch size: 56, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 06:59:47,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=619880.0, ans=0.09899494936611666 2023-11-19 06:59:49,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2023-11-19 06:59:50,693 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.376e+01 8.906e+01 9.929e+01 1.528e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 06:59:53,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=619946.6666666666, ans=0.0 2023-11-19 06:59:55,716 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:00:03,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=620013.3333333334, ans=0.1 2023-11-19 07:00:08,516 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.650e-02 2023-11-19 07:00:11,477 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8850, loss[loss=0.08322, simple_loss=0.1028, pruned_loss=0.02139, audio_tagging_loss=0.01044, over 16184.00 frames. ], tot_loss[loss=0.09009, simple_loss=0.108, pruned_loss=0.02517, audio_tagging_loss=0.0109, over 3037619.17 frames. ], batch size: 58, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 07:00:23,164 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:00:28,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=620146.6666666666, ans=0.2 2023-11-19 07:00:37,729 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=620213.3333333334, ans=0.125 2023-11-19 07:00:39,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=620213.3333333334, ans=0.125 2023-11-19 07:00:45,090 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:00:48,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=620280.0, ans=0.125 2023-11-19 07:01:07,559 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8900, loss[loss=0.08852, simple_loss=0.1091, pruned_loss=0.02386, audio_tagging_loss=0.01012, over 15933.00 frames. ], tot_loss[loss=0.0892, simple_loss=0.107, pruned_loss=0.02489, audio_tagging_loss=0.01082, over 3051454.80 frames. ], batch size: 59, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 07:01:10,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=620413.3333333334, ans=0.125 2023-11-19 07:01:11,982 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=620413.3333333334, ans=0.0 2023-11-19 07:01:27,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=620480.0, ans=0.0 2023-11-19 07:01:36,642 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=620546.6666666666, ans=0.125 2023-11-19 07:01:42,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.332e+01 9.247e+01 1.033e+02 1.504e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-19 07:01:47,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=620613.3333333334, ans=0.125 2023-11-19 07:01:56,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=620680.0, ans=0.125 2023-11-19 07:01:59,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=620680.0, ans=0.125 2023-11-19 07:02:02,963 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 8950, loss[loss=0.06712, simple_loss=0.07763, pruned_loss=0.0201, audio_tagging_loss=0.008212, over 14687.00 frames. ], tot_loss[loss=0.08898, simple_loss=0.1069, pruned_loss=0.02482, audio_tagging_loss=0.01073, over 3047510.70 frames. ], batch size: 56, lr: 8.53e-03, grad_scale: 16.0 2023-11-19 07:02:09,427 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=620746.6666666666, ans=0.125 2023-11-19 07:02:35,458 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=620946.6666666666, ans=0.0 2023-11-19 07:02:38,009 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.75 vs. limit=15.0 2023-11-19 07:02:52,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=621013.3333333334, ans=0.1 2023-11-19 07:02:57,849 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9000, loss[loss=0.09658, simple_loss=0.1157, pruned_loss=0.02801, audio_tagging_loss=0.01071, over 14464.00 frames. ], tot_loss[loss=0.08964, simple_loss=0.1077, pruned_loss=0.02513, audio_tagging_loss=0.01067, over 3041127.81 frames. ], batch size: 53, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:02:57,850 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 07:03:19,515 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3039, 4.9898, 4.8269, 5.1686], device='cuda:3') 2023-11-19 07:03:30,607 INFO [train_asr.py:1147] (3/4) Epoch 8, validation: loss=0.06719, simple_loss=0.05665, pruned_loss=0.006997, audio_tagging_loss=0.03186, over 4681554.00 frames. 2023-11-19 07:03:30,607 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 07:03:42,006 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2023-11-19 07:03:42,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=621146.6666666666, ans=0.125 2023-11-19 07:03:45,293 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.18 vs. limit=15.0 2023-11-19 07:04:05,132 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.568e+01 9.180e+01 1.028e+02 1.650e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 07:04:05,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=621280.0, ans=0.04949747468305833 2023-11-19 07:04:09,633 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=22.5 2023-11-19 07:04:26,002 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9050, loss[loss=0.1058, simple_loss=0.1333, pruned_loss=0.03061, audio_tagging_loss=0.008518, over 15527.00 frames. ], tot_loss[loss=0.08972, simple_loss=0.1079, pruned_loss=0.02523, audio_tagging_loss=0.01055, over 3042487.98 frames. ], batch size: 55, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:04:26,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=621413.3333333334, ans=0.0 2023-11-19 07:04:38,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=621480.0, ans=0.0 2023-11-19 07:04:52,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=621546.6666666666, ans=0.125 2023-11-19 07:04:54,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=621546.6666666666, ans=0.1 2023-11-19 07:04:59,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2023-11-19 07:05:04,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=621613.3333333334, ans=0.0 2023-11-19 07:05:05,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=621613.3333333334, ans=0.125 2023-11-19 07:05:16,516 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:05:20,479 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9100, loss[loss=0.04787, simple_loss=0.04947, pruned_loss=0.01205, audio_tagging_loss=0.01108, over 15387.00 frames. ], tot_loss[loss=0.08969, simple_loss=0.108, pruned_loss=0.02516, audio_tagging_loss=0.01055, over 3049718.07 frames. ], batch size: 61, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:05:30,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2023-11-19 07:05:56,148 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.531e+01 9.413e+01 1.050e+02 2.515e+02, threshold=1.883e+02, percent-clipped=1.0 2023-11-19 07:05:58,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=621946.6666666666, ans=0.125 2023-11-19 07:06:11,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=622013.3333333334, ans=0.0 2023-11-19 07:06:15,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=622080.0, ans=0.125 2023-11-19 07:06:16,308 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9150, loss[loss=0.07702, simple_loss=0.09014, pruned_loss=0.02292, audio_tagging_loss=0.009023, over 15259.00 frames. ], tot_loss[loss=0.08963, simple_loss=0.1082, pruned_loss=0.02508, audio_tagging_loss=0.01047, over 3046613.52 frames. ], batch size: 57, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:06:25,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=622080.0, ans=0.125 2023-11-19 07:06:39,869 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:06:53,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=622280.0, ans=0.125 2023-11-19 07:07:01,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=622346.6666666666, ans=0.125 2023-11-19 07:07:12,189 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9200, loss[loss=0.09736, simple_loss=0.1273, pruned_loss=0.02561, audio_tagging_loss=0.008116, over 15204.00 frames. ], tot_loss[loss=0.08923, simple_loss=0.1077, pruned_loss=0.02494, audio_tagging_loss=0.01043, over 3044589.01 frames. ], batch size: 56, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:07:30,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622480.0, ans=0.1 2023-11-19 07:07:35,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=622546.6666666666, ans=0.125 2023-11-19 07:07:48,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.490e+01 9.151e+01 1.009e+02 3.492e+02, threshold=1.830e+02, percent-clipped=1.0 2023-11-19 07:07:48,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=622613.3333333334, ans=0.0 2023-11-19 07:07:53,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2023-11-19 07:08:06,981 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9250, loss[loss=0.09862, simple_loss=0.1236, pruned_loss=0.02709, audio_tagging_loss=0.009733, over 14333.00 frames. ], tot_loss[loss=0.08951, simple_loss=0.1077, pruned_loss=0.02517, audio_tagging_loss=0.0105, over 3048781.19 frames. ], batch size: 55, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:08:09,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=622746.6666666666, ans=0.0 2023-11-19 07:08:10,485 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=622746.6666666666, ans=0.0 2023-11-19 07:08:19,946 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622813.3333333334, ans=0.1 2023-11-19 07:08:22,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=622813.3333333334, ans=0.125 2023-11-19 07:08:24,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=622813.3333333334, ans=0.2 2023-11-19 07:08:48,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=622946.6666666666, ans=0.2 2023-11-19 07:09:03,092 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9300, loss[loss=0.1191, simple_loss=0.1524, pruned_loss=0.03606, audio_tagging_loss=0.006853, over 14713.00 frames. ], tot_loss[loss=0.08994, simple_loss=0.1082, pruned_loss=0.02532, audio_tagging_loss=0.01051, over 3046111.80 frames. ], batch size: 57, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:09:09,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=623080.0, ans=0.0 2023-11-19 07:09:23,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=623146.6666666666, ans=0.125 2023-11-19 07:09:31,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=623213.3333333334, ans=0.125 2023-11-19 07:09:39,349 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.509e+01 9.283e+01 1.013e+02 1.341e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 07:09:58,418 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9350, loss[loss=0.1057, simple_loss=0.1266, pruned_loss=0.03322, audio_tagging_loss=0.009203, over 15806.00 frames. ], tot_loss[loss=0.09052, simple_loss=0.109, pruned_loss=0.02561, audio_tagging_loss=0.01043, over 3054244.30 frames. ], batch size: 57, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:10:02,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=623413.3333333334, ans=0.125 2023-11-19 07:10:02,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=623413.3333333334, ans=0.0 2023-11-19 07:10:17,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=623480.0, ans=0.2 2023-11-19 07:10:18,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=623480.0, ans=0.125 2023-11-19 07:10:47,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=623680.0, ans=0.125 2023-11-19 07:10:54,092 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9400, loss[loss=0.1268, simple_loss=0.1587, pruned_loss=0.04006, audio_tagging_loss=0.007394, over 16578.00 frames. ], tot_loss[loss=0.09125, simple_loss=0.1098, pruned_loss=0.02592, audio_tagging_loss=0.01045, over 3057687.16 frames. ], batch size: 60, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:10:57,672 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2023-11-19 07:11:10,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=623813.3333333334, ans=0.0 2023-11-19 07:11:31,168 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.638e+01 9.433e+01 1.071e+02 1.581e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 07:11:32,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=623946.6666666666, ans=0.035 2023-11-19 07:11:42,596 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.64 vs. limit=15.0 2023-11-19 07:11:48,809 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:11:49,872 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9450, loss[loss=0.09668, simple_loss=0.123, pruned_loss=0.02407, audio_tagging_loss=0.01113, over 14807.00 frames. ], tot_loss[loss=0.09131, simple_loss=0.1098, pruned_loss=0.02583, audio_tagging_loss=0.01056, over 3061155.33 frames. ], batch size: 53, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:12:10,583 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=624146.6666666666, ans=0.95 2023-11-19 07:12:21,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=624213.3333333334, ans=0.0 2023-11-19 07:12:30,164 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=624280.0, ans=0.125 2023-11-19 07:12:44,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624346.6666666666, ans=0.1 2023-11-19 07:12:45,961 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9500, loss[loss=0.08256, simple_loss=0.09451, pruned_loss=0.02265, audio_tagging_loss=0.01266, over 13907.00 frames. ], tot_loss[loss=0.09101, simple_loss=0.1095, pruned_loss=0.02558, audio_tagging_loss=0.01067, over 3060356.15 frames. ], batch size: 56, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:12:57,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624480.0, ans=0.1 2023-11-19 07:13:01,578 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=624480.0, ans=0.125 2023-11-19 07:13:08,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=624546.6666666666, ans=0.125 2023-11-19 07:13:13,798 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2023-11-19 07:13:15,113 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624546.6666666666, ans=0.1 2023-11-19 07:13:22,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.457e+01 9.140e+01 9.820e+01 1.196e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 07:13:27,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=624613.3333333334, ans=0.0 2023-11-19 07:13:28,532 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:13:31,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=624680.0, ans=0.2 2023-11-19 07:13:36,451 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-19 07:13:41,593 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9550, loss[loss=0.08833, simple_loss=0.102, pruned_loss=0.02554, audio_tagging_loss=0.01181, over 15277.00 frames. ], tot_loss[loss=0.09048, simple_loss=0.1088, pruned_loss=0.02529, audio_tagging_loss=0.0108, over 3059826.23 frames. ], batch size: 57, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:13:43,331 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.93 vs. limit=15.0 2023-11-19 07:13:48,637 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2023-11-19 07:13:50,267 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=624746.6666666666, ans=0.0 2023-11-19 07:14:19,980 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2023-11-19 07:14:20,689 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-11-19 07:14:22,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=624946.6666666666, ans=0.0 2023-11-19 07:14:32,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=625013.3333333334, ans=0.0 2023-11-19 07:14:37,083 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9600, loss[loss=0.08177, simple_loss=0.1029, pruned_loss=0.02018, audio_tagging_loss=0.01012, over 14076.00 frames. ], tot_loss[loss=0.0904, simple_loss=0.1087, pruned_loss=0.0252, audio_tagging_loss=0.01083, over 3053191.79 frames. ], batch size: 53, lr: 8.50e-03, grad_scale: 32.0 2023-11-19 07:14:58,763 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.42 vs. limit=15.0 2023-11-19 07:14:59,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=12.0 2023-11-19 07:15:13,359 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.461e+01 9.298e+01 1.020e+02 1.547e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 07:15:28,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625346.6666666666, ans=0.1 2023-11-19 07:15:33,204 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9650, loss[loss=0.1046, simple_loss=0.1205, pruned_loss=0.02977, audio_tagging_loss=0.01459, over 14425.00 frames. ], tot_loss[loss=0.09019, simple_loss=0.1083, pruned_loss=0.02519, audio_tagging_loss=0.01085, over 3048997.38 frames. ], batch size: 52, lr: 8.50e-03, grad_scale: 32.0 2023-11-19 07:15:36,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625413.3333333334, ans=0.1 2023-11-19 07:15:45,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=625480.0, ans=0.0 2023-11-19 07:16:04,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=625546.6666666666, ans=0.0 2023-11-19 07:16:13,346 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=22.5 2023-11-19 07:16:16,880 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=625680.0, ans=0.125 2023-11-19 07:16:20,014 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:16:21,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=625680.0, ans=0.0 2023-11-19 07:16:22,212 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:16:28,188 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9700, loss[loss=0.09242, simple_loss=0.1049, pruned_loss=0.02784, audio_tagging_loss=0.0121, over 15775.00 frames. ], tot_loss[loss=0.08978, simple_loss=0.1081, pruned_loss=0.0251, audio_tagging_loss=0.01065, over 3048810.03 frames. ], batch size: 60, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:16:46,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=625813.3333333334, ans=0.125 2023-11-19 07:16:57,240 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.78 vs. limit=10.0 2023-11-19 07:16:58,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=625880.0, ans=0.0 2023-11-19 07:17:00,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=625880.0, ans=0.05 2023-11-19 07:17:02,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625946.6666666666, ans=0.1 2023-11-19 07:17:05,231 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.422e+01 9.037e+01 9.716e+01 1.315e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 07:17:11,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=626013.3333333334, ans=0.95 2023-11-19 07:17:19,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626013.3333333334, ans=0.1 2023-11-19 07:17:24,174 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9750, loss[loss=0.0787, simple_loss=0.09381, pruned_loss=0.02027, audio_tagging_loss=0.01152, over 15194.00 frames. ], tot_loss[loss=0.08888, simple_loss=0.1071, pruned_loss=0.02477, audio_tagging_loss=0.01058, over 3051950.24 frames. ], batch size: 57, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:17:27,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=626080.0, ans=0.125 2023-11-19 07:17:42,583 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2023-11-19 07:17:44,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=626146.6666666666, ans=0.125 2023-11-19 07:17:47,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=626213.3333333334, ans=0.95 2023-11-19 07:18:00,939 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=22.5 2023-11-19 07:18:12,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=626346.6666666666, ans=0.125 2023-11-19 07:18:15,644 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:18:19,721 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9800, loss[loss=0.08072, simple_loss=0.1006, pruned_loss=0.02068, audio_tagging_loss=0.009759, over 14013.00 frames. ], tot_loss[loss=0.08883, simple_loss=0.107, pruned_loss=0.02484, audio_tagging_loss=0.01048, over 3045862.84 frames. ], batch size: 53, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:18:21,491 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626413.3333333334, ans=0.1 2023-11-19 07:18:31,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=626480.0, ans=0.2 2023-11-19 07:18:43,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=626546.6666666666, ans=0.05 2023-11-19 07:18:56,499 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.157e+01 8.941e+01 9.770e+01 1.328e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 07:19:00,350 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.34 vs. limit=15.0 2023-11-19 07:19:10,037 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:19:15,292 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9850, loss[loss=0.08746, simple_loss=0.1138, pruned_loss=0.02112, audio_tagging_loss=0.009432, over 14606.00 frames. ], tot_loss[loss=0.0891, simple_loss=0.1073, pruned_loss=0.02495, audio_tagging_loss=0.01049, over 3046312.20 frames. ], batch size: 56, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:19:19,800 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=626746.6666666666, ans=0.125 2023-11-19 07:19:56,379 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.06 vs. limit=15.0 2023-11-19 07:20:09,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=627080.0, ans=0.125 2023-11-19 07:20:10,742 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9900, loss[loss=0.08452, simple_loss=0.09762, pruned_loss=0.02501, audio_tagging_loss=0.0107, over 15329.00 frames. ], tot_loss[loss=0.08883, simple_loss=0.1069, pruned_loss=0.02485, audio_tagging_loss=0.01051, over 3051872.44 frames. ], batch size: 60, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:20:12,515 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=627080.0, ans=0.0 2023-11-19 07:20:28,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2023-11-19 07:20:44,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=627280.0, ans=0.0 2023-11-19 07:20:47,071 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.674e+01 9.203e+01 1.082e+02 1.582e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 07:21:06,725 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 9950, loss[loss=0.1022, simple_loss=0.1136, pruned_loss=0.03108, audio_tagging_loss=0.01435, over 14506.00 frames. ], tot_loss[loss=0.08901, simple_loss=0.107, pruned_loss=0.02503, audio_tagging_loss=0.01049, over 3050380.87 frames. ], batch size: 54, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:21:12,730 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=627413.3333333334, ans=0.1 2023-11-19 07:21:13,261 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2023-11-19 07:21:31,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=627546.6666666666, ans=0.0 2023-11-19 07:21:58,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=627680.0, ans=0.2 2023-11-19 07:22:02,091 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10000, loss[loss=0.1009, simple_loss=0.1315, pruned_loss=0.02702, audio_tagging_loss=0.008109, over 15264.00 frames. ], tot_loss[loss=0.08961, simple_loss=0.108, pruned_loss=0.02517, audio_tagging_loss=0.01042, over 3049957.06 frames. ], batch size: 54, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:22:06,774 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.97 vs. limit=22.5 2023-11-19 07:22:07,119 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.74 vs. limit=5.0 2023-11-19 07:22:13,861 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.852e-02 2023-11-19 07:22:14,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.37 vs. limit=22.5 2023-11-19 07:22:26,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=627880.0, ans=0.1 2023-11-19 07:22:38,986 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.402e+01 8.658e+01 9.575e+01 1.064e+02 1.480e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-19 07:22:45,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=628013.3333333334, ans=0.09899494936611666 2023-11-19 07:22:57,037 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10050, loss[loss=0.09493, simple_loss=0.1193, pruned_loss=0.02609, audio_tagging_loss=0.00921, over 15640.00 frames. ], tot_loss[loss=0.08993, simple_loss=0.1083, pruned_loss=0.0253, audio_tagging_loss=0.01046, over 3049872.67 frames. ], batch size: 58, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:22:59,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=628080.0, ans=0.0 2023-11-19 07:22:59,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=628080.0, ans=0.125 2023-11-19 07:23:15,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=628146.6666666666, ans=0.2 2023-11-19 07:23:17,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=628146.6666666666, ans=0.1 2023-11-19 07:23:31,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=628280.0, ans=0.125 2023-11-19 07:23:32,007 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=628280.0, ans=0.125 2023-11-19 07:23:34,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=628280.0, ans=0.125 2023-11-19 07:23:35,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=628280.0, ans=0.95 2023-11-19 07:23:53,591 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10100, loss[loss=0.117, simple_loss=0.1494, pruned_loss=0.03441, audio_tagging_loss=0.007867, over 15401.00 frames. ], tot_loss[loss=0.08981, simple_loss=0.1081, pruned_loss=0.02525, audio_tagging_loss=0.01049, over 3048863.07 frames. ], batch size: 55, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:24:08,214 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2023-11-19 07:24:10,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=628480.0, ans=0.2 2023-11-19 07:24:23,183 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2023-11-19 07:24:29,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.846e+01 9.478e+01 1.084e+02 1.850e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 07:24:29,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=628613.3333333334, ans=0.125 2023-11-19 07:24:31,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2023-11-19 07:24:33,791 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=12.0 2023-11-19 07:24:37,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=628680.0, ans=0.125 2023-11-19 07:24:37,884 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:24:38,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=628680.0, ans=0.125 2023-11-19 07:24:48,921 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10150, loss[loss=0.07479, simple_loss=0.09029, pruned_loss=0.01838, audio_tagging_loss=0.01126, over 15415.00 frames. ], tot_loss[loss=0.09032, simple_loss=0.1089, pruned_loss=0.02536, audio_tagging_loss=0.01051, over 3044691.59 frames. ], batch size: 57, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:25:02,207 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.31 vs. limit=10.0 2023-11-19 07:25:15,296 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:25:16,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628880.0, ans=0.1 2023-11-19 07:25:16,626 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=628880.0, ans=0.125 2023-11-19 07:25:23,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=628946.6666666666, ans=0.0 2023-11-19 07:25:27,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=628946.6666666666, ans=0.0 2023-11-19 07:25:29,238 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=628946.6666666666, ans=0.125 2023-11-19 07:25:31,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=628946.6666666666, ans=0.125 2023-11-19 07:25:43,825 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10200, loss[loss=0.1012, simple_loss=0.1267, pruned_loss=0.02904, audio_tagging_loss=0.008758, over 15436.00 frames. ], tot_loss[loss=0.09041, simple_loss=0.1091, pruned_loss=0.02528, audio_tagging_loss=0.01058, over 3050667.22 frames. ], batch size: 60, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:25:57,844 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=629146.6666666666, ans=0.0 2023-11-19 07:26:05,528 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:26:08,467 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=629213.3333333334, ans=0.0 2023-11-19 07:26:20,960 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.630e+01 9.623e+01 1.074e+02 1.731e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-19 07:26:40,088 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10250, loss[loss=0.09782, simple_loss=0.1216, pruned_loss=0.02811, audio_tagging_loss=0.008923, over 15307.00 frames. ], tot_loss[loss=0.09048, simple_loss=0.1092, pruned_loss=0.02524, audio_tagging_loss=0.01064, over 3050236.83 frames. ], batch size: 57, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:26:43,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.06 vs. limit=15.0 2023-11-19 07:26:45,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=629413.3333333334, ans=0.0 2023-11-19 07:27:06,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=629546.6666666666, ans=0.95 2023-11-19 07:27:19,527 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=629613.3333333334, ans=0.1 2023-11-19 07:27:36,290 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10300, loss[loss=0.07935, simple_loss=0.09153, pruned_loss=0.01965, audio_tagging_loss=0.01393, over 14269.00 frames. ], tot_loss[loss=0.0901, simple_loss=0.1085, pruned_loss=0.02504, audio_tagging_loss=0.01082, over 3057803.17 frames. ], batch size: 53, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:28:00,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=629880.0, ans=0.125 2023-11-19 07:28:05,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=629880.0, ans=0.125 2023-11-19 07:28:07,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=629880.0, ans=0.125 2023-11-19 07:28:10,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=629946.6666666666, ans=0.0 2023-11-19 07:28:12,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.771e+01 9.491e+01 1.012e+02 1.579e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-19 07:28:21,887 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2023-11-19 07:28:23,899 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2023-11-19 07:28:30,765 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10350, loss[loss=0.07989, simple_loss=0.08868, pruned_loss=0.02478, audio_tagging_loss=0.01076, over 14247.00 frames. ], tot_loss[loss=0.08982, simple_loss=0.1082, pruned_loss=0.0249, audio_tagging_loss=0.01082, over 3061186.23 frames. ], batch size: 56, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:28:32,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=630080.0, ans=0.1 2023-11-19 07:28:37,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=630080.0, ans=0.0 2023-11-19 07:28:42,663 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=630146.6666666666, ans=0.0 2023-11-19 07:28:45,805 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630146.6666666666, ans=0.1 2023-11-19 07:28:55,065 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.76 vs. limit=22.5 2023-11-19 07:29:13,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=630280.0, ans=0.0 2023-11-19 07:29:22,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.47 vs. limit=6.0 2023-11-19 07:29:26,686 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10400, loss[loss=0.07903, simple_loss=0.08989, pruned_loss=0.02378, audio_tagging_loss=0.01031, over 15091.00 frames. ], tot_loss[loss=0.09002, simple_loss=0.108, pruned_loss=0.02502, audio_tagging_loss=0.01098, over 3054274.12 frames. ], batch size: 56, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:29:32,429 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2023-11-19 07:29:51,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-19 07:30:03,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.887e+01 8.595e+01 9.141e+01 1.035e+02 1.375e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 07:30:22,397 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10450, loss[loss=0.1109, simple_loss=0.1329, pruned_loss=0.03165, audio_tagging_loss=0.01275, over 14442.00 frames. ], tot_loss[loss=0.09077, simple_loss=0.1096, pruned_loss=0.02529, audio_tagging_loss=0.0107, over 3052247.72 frames. ], batch size: 55, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:30:22,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=630746.6666666666, ans=0.0 2023-11-19 07:30:52,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=630880.0, ans=0.125 2023-11-19 07:30:53,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=630880.0, ans=0.125 2023-11-19 07:30:53,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.95 vs. limit=10.0 2023-11-19 07:30:56,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=630946.6666666666, ans=0.125 2023-11-19 07:31:02,600 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=630946.6666666666, ans=0.0 2023-11-19 07:31:04,743 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=630946.6666666666, ans=0.0 2023-11-19 07:31:16,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=631080.0, ans=0.0 2023-11-19 07:31:17,695 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10500, loss[loss=0.09087, simple_loss=0.1175, pruned_loss=0.02171, audio_tagging_loss=0.01039, over 16276.00 frames. ], tot_loss[loss=0.09086, simple_loss=0.1097, pruned_loss=0.0254, audio_tagging_loss=0.01063, over 3055818.41 frames. ], batch size: 59, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:31:45,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=631213.3333333334, ans=0.0 2023-11-19 07:31:48,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=631213.3333333334, ans=0.0 2023-11-19 07:31:52,829 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=631280.0, ans=0.1 2023-11-19 07:31:54,645 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.554e+01 9.380e+01 1.032e+02 1.223e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 07:32:13,166 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10550, loss[loss=0.08504, simple_loss=0.105, pruned_loss=0.02031, audio_tagging_loss=0.01225, over 15256.00 frames. ], tot_loss[loss=0.09043, simple_loss=0.1096, pruned_loss=0.02526, audio_tagging_loss=0.01038, over 3054199.34 frames. ], batch size: 54, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:32:17,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=631413.3333333334, ans=0.0 2023-11-19 07:32:18,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=631413.3333333334, ans=0.0 2023-11-19 07:32:36,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=631546.6666666666, ans=0.1 2023-11-19 07:32:43,962 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2023-11-19 07:32:53,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=631613.3333333334, ans=0.125 2023-11-19 07:33:09,241 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10600, loss[loss=0.09715, simple_loss=0.1188, pruned_loss=0.02682, audio_tagging_loss=0.01095, over 15295.00 frames. ], tot_loss[loss=0.09053, simple_loss=0.1098, pruned_loss=0.02537, audio_tagging_loss=0.01028, over 3050532.20 frames. ], batch size: 56, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:33:38,286 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-11-19 07:33:39,752 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2023-11-19 07:33:45,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.589e+01 9.112e+01 9.990e+01 1.319e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 07:33:48,215 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2023-11-19 07:33:57,196 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=632013.3333333334, ans=0.125 2023-11-19 07:34:04,983 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10650, loss[loss=0.05931, simple_loss=0.06403, pruned_loss=0.01324, audio_tagging_loss=0.01405, over 15356.00 frames. ], tot_loss[loss=0.08991, simple_loss=0.1089, pruned_loss=0.0251, audio_tagging_loss=0.01037, over 3050326.91 frames. ], batch size: 60, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:34:11,739 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=22.5 2023-11-19 07:34:37,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=632280.0, ans=0.0 2023-11-19 07:34:38,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632280.0, ans=0.1 2023-11-19 07:34:38,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=632280.0, ans=0.125 2023-11-19 07:35:00,539 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10700, loss[loss=0.1158, simple_loss=0.1517, pruned_loss=0.03031, audio_tagging_loss=0.009637, over 15628.00 frames. ], tot_loss[loss=0.08895, simple_loss=0.1074, pruned_loss=0.02471, audio_tagging_loss=0.01053, over 3047974.09 frames. ], batch size: 56, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:35:31,420 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2023-11-19 07:35:37,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 8.469e+01 9.057e+01 9.771e+01 1.264e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 07:35:56,122 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10750, loss[loss=0.08905, simple_loss=0.1123, pruned_loss=0.02205, audio_tagging_loss=0.01084, over 15201.00 frames. ], tot_loss[loss=0.08966, simple_loss=0.1082, pruned_loss=0.02504, audio_tagging_loss=0.0105, over 3047995.78 frames. ], batch size: 55, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:36:01,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2023-11-19 07:36:04,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=632746.6666666666, ans=0.0 2023-11-19 07:36:09,540 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=632813.3333333334, ans=0.0 2023-11-19 07:36:14,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=632813.3333333334, ans=0.0 2023-11-19 07:36:18,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=632880.0, ans=0.0 2023-11-19 07:36:19,422 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=632880.0, ans=0.0 2023-11-19 07:36:46,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=633013.3333333334, ans=0.125 2023-11-19 07:36:51,430 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10800, loss[loss=0.1061, simple_loss=0.1235, pruned_loss=0.03203, audio_tagging_loss=0.01231, over 15577.00 frames. ], tot_loss[loss=0.0896, simple_loss=0.1083, pruned_loss=0.02493, audio_tagging_loss=0.01053, over 3051917.57 frames. ], batch size: 59, lr: 8.44e-03, grad_scale: 32.0 2023-11-19 07:37:16,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=633213.3333333334, ans=0.2 2023-11-19 07:37:18,115 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2023-11-19 07:37:18,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=633213.3333333334, ans=0.0 2023-11-19 07:37:29,645 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.684e+01 9.473e+01 1.039e+02 1.467e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-19 07:37:35,489 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=633346.6666666666, ans=0.125 2023-11-19 07:37:35,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=633346.6666666666, ans=0.5 2023-11-19 07:37:48,136 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10850, loss[loss=0.05538, simple_loss=0.05385, pruned_loss=0.01473, audio_tagging_loss=0.01373, over 13917.00 frames. ], tot_loss[loss=0.08916, simple_loss=0.1074, pruned_loss=0.02486, audio_tagging_loss=0.01061, over 3045398.83 frames. ], batch size: 55, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:37:59,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=15.0 2023-11-19 07:38:22,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=633613.3333333334, ans=0.125 2023-11-19 07:38:35,478 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=633680.0, ans=0.125 2023-11-19 07:38:37,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633680.0, ans=0.1 2023-11-19 07:38:39,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=633680.0, ans=0.2 2023-11-19 07:38:39,689 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=633680.0, ans=0.0 2023-11-19 07:38:40,560 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:38:41,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.08 vs. limit=15.0 2023-11-19 07:38:43,645 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10900, loss[loss=0.08126, simple_loss=0.09277, pruned_loss=0.02492, audio_tagging_loss=0.009958, over 15479.00 frames. ], tot_loss[loss=0.08927, simple_loss=0.1074, pruned_loss=0.02493, audio_tagging_loss=0.01065, over 3046413.17 frames. ], batch size: 59, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:38:58,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=633813.3333333334, ans=0.025 2023-11-19 07:39:13,532 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2023-11-19 07:39:14,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=633880.0, ans=0.125 2023-11-19 07:39:18,993 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=633946.6666666666, ans=0.125 2023-11-19 07:39:21,885 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.390e+01 9.440e+01 1.044e+02 1.572e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 07:39:27,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2023-11-19 07:39:29,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=15.0 2023-11-19 07:39:30,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634013.3333333334, ans=0.1 2023-11-19 07:39:39,382 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 10950, loss[loss=0.07866, simple_loss=0.1, pruned_loss=0.0185, audio_tagging_loss=0.01014, over 15090.00 frames. ], tot_loss[loss=0.08855, simple_loss=0.1064, pruned_loss=0.02458, audio_tagging_loss=0.01079, over 3045696.53 frames. ], batch size: 59, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:39:46,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-19 07:40:02,454 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=634213.3333333334, ans=0.125 2023-11-19 07:40:18,964 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634280.0, ans=0.1 2023-11-19 07:40:34,749 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11000, loss[loss=0.07151, simple_loss=0.08647, pruned_loss=0.01506, audio_tagging_loss=0.01321, over 15229.00 frames. ], tot_loss[loss=0.08849, simple_loss=0.1063, pruned_loss=0.02455, audio_tagging_loss=0.01078, over 3047504.76 frames. ], batch size: 57, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:40:44,798 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:41:12,768 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.297e+01 9.076e+01 1.002e+02 1.429e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 07:41:14,082 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:41:14,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=634613.3333333334, ans=0.0 2023-11-19 07:41:16,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=634613.3333333334, ans=0.125 2023-11-19 07:41:24,108 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2023-11-19 07:41:31,720 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11050, loss[loss=0.08494, simple_loss=0.09971, pruned_loss=0.02176, audio_tagging_loss=0.01332, over 15328.00 frames. ], tot_loss[loss=0.08905, simple_loss=0.1073, pruned_loss=0.02469, audio_tagging_loss=0.01069, over 3051204.10 frames. ], batch size: 56, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:41:31,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=634746.6666666666, ans=0.125 2023-11-19 07:41:38,831 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=634746.6666666666, ans=0.125 2023-11-19 07:41:45,136 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=634813.3333333334, ans=0.125 2023-11-19 07:41:45,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=634813.3333333334, ans=0.125 2023-11-19 07:42:05,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=634946.6666666666, ans=0.125 2023-11-19 07:42:08,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-11-19 07:42:13,328 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=634946.6666666666, ans=0.0 2023-11-19 07:42:27,264 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11100, loss[loss=0.09712, simple_loss=0.1157, pruned_loss=0.02873, audio_tagging_loss=0.01056, over 14893.00 frames. ], tot_loss[loss=0.08947, simple_loss=0.1076, pruned_loss=0.0248, audio_tagging_loss=0.01088, over 3041971.21 frames. ], batch size: 55, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:42:28,636 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635080.0, ans=0.1 2023-11-19 07:42:45,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=635146.6666666666, ans=0.0 2023-11-19 07:43:05,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.720e+01 9.689e+01 1.049e+02 1.321e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-19 07:43:16,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=635346.6666666666, ans=0.125 2023-11-19 07:43:20,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2023-11-19 07:43:22,344 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11150, loss[loss=0.0731, simple_loss=0.08665, pruned_loss=0.01888, audio_tagging_loss=0.01089, over 15477.00 frames. ], tot_loss[loss=0.09011, simple_loss=0.1083, pruned_loss=0.02507, audio_tagging_loss=0.01091, over 3044655.88 frames. ], batch size: 58, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:43:25,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=635413.3333333334, ans=0.0 2023-11-19 07:43:27,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=635413.3333333334, ans=0.0 2023-11-19 07:43:30,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=635413.3333333334, ans=10.0 2023-11-19 07:43:55,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=635613.3333333334, ans=0.02 2023-11-19 07:44:09,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.18 vs. limit=15.0 2023-11-19 07:44:18,683 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11200, loss[loss=0.07871, simple_loss=0.09528, pruned_loss=0.01989, audio_tagging_loss=0.01118, over 15683.00 frames. ], tot_loss[loss=0.09014, simple_loss=0.1084, pruned_loss=0.025, audio_tagging_loss=0.01096, over 3049856.62 frames. ], batch size: 59, lr: 8.43e-03, grad_scale: 32.0 2023-11-19 07:44:27,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=635746.6666666666, ans=0.5 2023-11-19 07:44:30,943 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=635813.3333333334, ans=0.2 2023-11-19 07:44:44,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=635880.0, ans=0.125 2023-11-19 07:44:45,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.19 vs. limit=6.0 2023-11-19 07:44:56,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.574e+01 8.472e+01 9.235e+01 9.971e+01 1.338e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 07:44:58,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635946.6666666666, ans=0.1 2023-11-19 07:45:09,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=636013.3333333334, ans=0.0 2023-11-19 07:45:14,413 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11250, loss[loss=0.09749, simple_loss=0.1225, pruned_loss=0.02778, audio_tagging_loss=0.008454, over 15463.00 frames. ], tot_loss[loss=0.08939, simple_loss=0.1075, pruned_loss=0.02475, audio_tagging_loss=0.0109, over 3046998.93 frames. ], batch size: 58, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:45:26,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=636146.6666666666, ans=0.0 2023-11-19 07:45:26,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=636146.6666666666, ans=0.125 2023-11-19 07:45:47,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=636280.0, ans=10.0 2023-11-19 07:46:00,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=636346.6666666666, ans=0.125 2023-11-19 07:46:09,191 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11300, loss[loss=0.04772, simple_loss=0.05141, pruned_loss=0.008234, audio_tagging_loss=0.01378, over 14498.00 frames. ], tot_loss[loss=0.08846, simple_loss=0.1066, pruned_loss=0.0244, audio_tagging_loss=0.01077, over 3054051.59 frames. ], batch size: 57, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:46:39,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=636546.6666666666, ans=0.125 2023-11-19 07:46:47,208 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.767e+01 9.641e+01 1.057e+02 1.574e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-19 07:46:47,634 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2023-11-19 07:46:57,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636680.0, ans=0.1 2023-11-19 07:47:02,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=636680.0, ans=0.2 2023-11-19 07:47:05,214 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11350, loss[loss=0.08412, simple_loss=0.09847, pruned_loss=0.02577, audio_tagging_loss=0.009118, over 15413.00 frames. ], tot_loss[loss=0.0892, simple_loss=0.1074, pruned_loss=0.0249, audio_tagging_loss=0.01062, over 3049115.38 frames. ], batch size: 56, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:47:10,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=636746.6666666666, ans=0.04949747468305833 2023-11-19 07:47:23,896 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2023-11-19 07:47:34,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=22.5 2023-11-19 07:47:47,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=636946.6666666666, ans=0.0 2023-11-19 07:47:51,072 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=12.0 2023-11-19 07:48:01,153 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11400, loss[loss=0.07231, simple_loss=0.08465, pruned_loss=0.01689, audio_tagging_loss=0.01309, over 15311.00 frames. ], tot_loss[loss=0.08923, simple_loss=0.1072, pruned_loss=0.02495, audio_tagging_loss=0.01066, over 3047515.30 frames. ], batch size: 57, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:48:01,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=637080.0, ans=0.2 2023-11-19 07:48:38,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.501e+01 9.316e+01 1.025e+02 1.797e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 07:48:44,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.58 vs. limit=10.0 2023-11-19 07:48:49,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=637346.6666666666, ans=0.04949747468305833 2023-11-19 07:48:56,275 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11450, loss[loss=0.09872, simple_loss=0.114, pruned_loss=0.02743, audio_tagging_loss=0.0143, over 14043.00 frames. ], tot_loss[loss=0.08954, simple_loss=0.108, pruned_loss=0.02497, audio_tagging_loss=0.01058, over 3046184.18 frames. ], batch size: 53, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:49:00,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=637413.3333333334, ans=0.0 2023-11-19 07:49:00,152 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2023-11-19 07:49:04,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=637413.3333333334, ans=0.125 2023-11-19 07:49:13,098 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.51 vs. limit=6.0 2023-11-19 07:49:13,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=637480.0, ans=0.0 2023-11-19 07:49:17,069 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=637480.0, ans=0.1 2023-11-19 07:49:26,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=637546.6666666666, ans=0.125 2023-11-19 07:49:35,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=637613.3333333334, ans=0.2 2023-11-19 07:49:48,404 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:49:53,054 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11500, loss[loss=0.08492, simple_loss=0.1063, pruned_loss=0.0238, audio_tagging_loss=0.00795, over 14800.00 frames. ], tot_loss[loss=0.08997, simple_loss=0.1087, pruned_loss=0.02517, audio_tagging_loss=0.01044, over 3046759.23 frames. ], batch size: 54, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:50:00,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=637746.6666666666, ans=0.05 2023-11-19 07:50:01,891 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=637746.6666666666, ans=0.125 2023-11-19 07:50:11,356 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=637813.3333333334, ans=0.2 2023-11-19 07:50:16,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=637880.0, ans=0.0 2023-11-19 07:50:17,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=637880.0, ans=0.0 2023-11-19 07:50:17,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=637880.0, ans=0.125 2023-11-19 07:50:30,710 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.770e+01 8.483e+01 9.238e+01 9.842e+01 1.262e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 07:50:31,042 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=637946.6666666666, ans=0.05 2023-11-19 07:50:34,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=637946.6666666666, ans=0.0 2023-11-19 07:50:42,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=638013.3333333334, ans=0.125 2023-11-19 07:50:49,249 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11550, loss[loss=0.07352, simple_loss=0.08513, pruned_loss=0.01842, audio_tagging_loss=0.01254, over 16082.00 frames. ], tot_loss[loss=0.08957, simple_loss=0.1084, pruned_loss=0.02493, audio_tagging_loss=0.01042, over 3055556.43 frames. ], batch size: 61, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:51:03,026 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.374e-01 2023-11-19 07:51:04,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=638146.6666666666, ans=0.125 2023-11-19 07:51:07,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=638146.6666666666, ans=0.125 2023-11-19 07:51:21,305 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-19 07:51:22,839 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:51:43,890 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11600, loss[loss=0.08936, simple_loss=0.1146, pruned_loss=0.02145, audio_tagging_loss=0.01062, over 16335.00 frames. ], tot_loss[loss=0.09034, simple_loss=0.1093, pruned_loss=0.02529, audio_tagging_loss=0.01039, over 3053956.68 frames. ], batch size: 59, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:52:01,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=638480.0, ans=0.125 2023-11-19 07:52:14,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638546.6666666666, ans=0.1 2023-11-19 07:52:15,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=638546.6666666666, ans=0.2 2023-11-19 07:52:21,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.734e+01 9.337e+01 1.014e+02 1.440e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 07:52:22,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=638613.3333333334, ans=0.2 2023-11-19 07:52:27,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=638680.0, ans=0.125 2023-11-19 07:52:39,914 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11650, loss[loss=0.08934, simple_loss=0.109, pruned_loss=0.02608, audio_tagging_loss=0.008769, over 14875.00 frames. ], tot_loss[loss=0.08922, simple_loss=0.108, pruned_loss=0.02477, audio_tagging_loss=0.01045, over 3045689.92 frames. ], batch size: 55, lr: 8.41e-03, grad_scale: 16.0 2023-11-19 07:52:57,625 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=638813.3333333334, ans=0.125 2023-11-19 07:53:10,862 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2023-11-19 07:53:13,843 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=638946.6666666666, ans=0.0 2023-11-19 07:53:14,856 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=638946.6666666666, ans=0.2 2023-11-19 07:53:24,656 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.30 vs. limit=22.5 2023-11-19 07:53:25,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2023-11-19 07:53:34,708 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11700, loss[loss=0.06982, simple_loss=0.07844, pruned_loss=0.01578, audio_tagging_loss=0.01483, over 16158.00 frames. ], tot_loss[loss=0.0885, simple_loss=0.1069, pruned_loss=0.02442, audio_tagging_loss=0.01063, over 3047705.49 frames. ], batch size: 62, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:53:40,723 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=639080.0, ans=0.05 2023-11-19 07:54:13,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.154e+01 8.844e+01 9.528e+01 1.167e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 07:54:30,857 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11750, loss[loss=0.0623, simple_loss=0.06301, pruned_loss=0.01694, audio_tagging_loss=0.01385, over 13956.00 frames. ], tot_loss[loss=0.08806, simple_loss=0.1061, pruned_loss=0.02425, audio_tagging_loss=0.01074, over 3051910.88 frames. ], batch size: 55, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:54:58,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=639546.6666666666, ans=0.125 2023-11-19 07:55:16,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=639680.0, ans=10.0 2023-11-19 07:55:26,012 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11800, loss[loss=0.1002, simple_loss=0.1182, pruned_loss=0.03023, audio_tagging_loss=0.01089, over 15602.00 frames. ], tot_loss[loss=0.08883, simple_loss=0.107, pruned_loss=0.02472, audio_tagging_loss=0.01062, over 3048345.47 frames. ], batch size: 58, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:55:52,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=639880.0, ans=0.0 2023-11-19 07:56:05,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 8.609e+01 9.500e+01 1.074e+02 1.455e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-19 07:56:24,333 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11850, loss[loss=0.1005, simple_loss=0.122, pruned_loss=0.0275, audio_tagging_loss=0.01201, over 15384.00 frames. ], tot_loss[loss=0.08863, simple_loss=0.1068, pruned_loss=0.02456, audio_tagging_loss=0.01066, over 3045755.38 frames. ], batch size: 56, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:56:25,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=640080.0, ans=0.125 2023-11-19 07:56:40,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=640146.6666666666, ans=0.0 2023-11-19 07:56:51,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=640213.3333333334, ans=0.09899494936611666 2023-11-19 07:56:53,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=640213.3333333334, ans=0.0 2023-11-19 07:57:11,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-19 07:57:13,544 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.17 vs. limit=6.0 2023-11-19 07:57:20,089 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11900, loss[loss=0.09899, simple_loss=0.1151, pruned_loss=0.02712, audio_tagging_loss=0.01433, over 16103.00 frames. ], tot_loss[loss=0.08924, simple_loss=0.1074, pruned_loss=0.0247, audio_tagging_loss=0.01082, over 3047970.95 frames. ], batch size: 60, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:57:33,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=640480.0, ans=0.2 2023-11-19 07:57:47,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=640546.6666666666, ans=0.0 2023-11-19 07:57:48,956 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.39 vs. limit=15.0 2023-11-19 07:57:59,326 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.550e+01 8.414e+01 8.951e+01 9.837e+01 1.464e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 07:58:00,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=640613.3333333334, ans=0.125 2023-11-19 07:58:01,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.96 vs. limit=15.0 2023-11-19 07:58:09,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=640680.0, ans=0.1 2023-11-19 07:58:16,216 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 11950, loss[loss=0.1004, simple_loss=0.1304, pruned_loss=0.02536, audio_tagging_loss=0.009811, over 14835.00 frames. ], tot_loss[loss=0.09022, simple_loss=0.1086, pruned_loss=0.02507, audio_tagging_loss=0.01086, over 3048643.62 frames. ], batch size: 56, lr: 8.39e-03, grad_scale: 16.0 2023-11-19 07:58:43,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=640880.0, ans=0.125 2023-11-19 07:58:47,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=640880.0, ans=0.0 2023-11-19 07:58:58,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.36 vs. limit=15.0 2023-11-19 07:59:05,172 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2023-11-19 07:59:06,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641013.3333333334, ans=0.1 2023-11-19 07:59:10,347 INFO [train_asr.py:1115] (3/4) Epoch 8, batch 12000, loss[loss=0.0865, simple_loss=0.1006, pruned_loss=0.02281, audio_tagging_loss=0.01336, over 14296.00 frames. ], tot_loss[loss=0.0905, simple_loss=0.1086, pruned_loss=0.02527, audio_tagging_loss=0.01093, over 3045353.80 frames. ], batch size: 56, lr: 8.39e-03, grad_scale: 32.0 2023-11-19 07:59:10,347 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 07:59:25,795 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2009, 2.2185, 3.4711, 2.3473], device='cuda:3') 2023-11-19 07:59:30,802 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9487, 5.8605, 5.7762, 5.5610], device='cuda:3') 2023-11-19 07:59:42,995 INFO [train_asr.py:1147] (3/4) Epoch 8, validation: loss=0.06649, simple_loss=0.05653, pruned_loss=0.006961, audio_tagging_loss=0.03127, over 4681554.00 frames. 2023-11-19 07:59:42,995 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 07:59:47,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=641080.0, ans=0.025 2023-11-19 07:59:55,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=641146.6666666666, ans=0.125 2023-11-19 08:00:44,328 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 0, loss[loss=0.1167, simple_loss=0.1333, pruned_loss=0.02718, audio_tagging_loss=0.02293, over 14887.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1333, pruned_loss=0.02718, audio_tagging_loss=0.02293, over 14887.00 frames. ], batch size: 56, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:00:44,329 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 08:01:16,085 INFO [train_asr.py:1147] (3/4) Epoch 9, validation: loss=0.06566, simple_loss=0.05652, pruned_loss=0.006966, audio_tagging_loss=0.03043, over 4681554.00 frames. 2023-11-19 08:01:16,085 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 08:01:20,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641240.0, ans=0.1 2023-11-19 08:01:28,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.618e+01 8.783e+01 9.637e+01 1.099e+02 1.400e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-19 08:01:33,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=641306.6666666666, ans=0.0 2023-11-19 08:01:33,651 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=22.5 2023-11-19 08:01:38,854 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2023-11-19 08:01:55,495 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=641440.0, ans=0.05 2023-11-19 08:02:08,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=641506.6666666666, ans=0.125 2023-11-19 08:02:10,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=641506.6666666666, ans=0.2 2023-11-19 08:02:12,368 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 50, loss[loss=0.1022, simple_loss=0.1184, pruned_loss=0.02478, audio_tagging_loss=0.01823, over 15112.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.11, pruned_loss=0.02558, audio_tagging_loss=0.02004, over 682716.38 frames. ], batch size: 56, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:02:16,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-19 08:02:30,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=641640.0, ans=0.125 2023-11-19 08:02:49,115 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=641773.3333333334, ans=0.125 2023-11-19 08:03:01,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=641840.0, ans=0.0 2023-11-19 08:03:04,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=641840.0, ans=0.0 2023-11-19 08:03:07,987 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 100, loss[loss=0.09751, simple_loss=0.1177, pruned_loss=0.02317, audio_tagging_loss=0.0155, over 15312.00 frames. ], tot_loss[loss=0.09863, simple_loss=0.1088, pruned_loss=0.02491, audio_tagging_loss=0.01933, over 1209030.46 frames. ], batch size: 56, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:03:19,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.656e+01 9.404e+01 1.019e+02 1.351e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 08:03:29,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=642040.0, ans=0.2 2023-11-19 08:03:31,555 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.21 vs. limit=6.0 2023-11-19 08:03:34,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=642040.0, ans=0.125 2023-11-19 08:03:47,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2023-11-19 08:03:54,684 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642173.3333333334, ans=0.1 2023-11-19 08:04:01,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=642173.3333333334, ans=0.025 2023-11-19 08:04:03,507 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 150, loss[loss=0.1133, simple_loss=0.1344, pruned_loss=0.03232, audio_tagging_loss=0.01382, over 14936.00 frames. ], tot_loss[loss=0.09565, simple_loss=0.1076, pruned_loss=0.02447, audio_tagging_loss=0.0174, over 1615454.80 frames. ], batch size: 56, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:04:13,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=642240.0, ans=0.2 2023-11-19 08:04:24,298 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-19 08:04:33,201 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.81 vs. limit=10.0 2023-11-19 08:04:38,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=642440.0, ans=0.125 2023-11-19 08:04:40,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=642440.0, ans=0.0 2023-11-19 08:04:51,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=642506.6666666666, ans=0.125 2023-11-19 08:04:51,150 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=642506.6666666666, ans=0.125 2023-11-19 08:04:55,264 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=642506.6666666666, ans=0.0 2023-11-19 08:04:59,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.78 vs. limit=12.0 2023-11-19 08:04:59,889 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 200, loss[loss=0.07583, simple_loss=0.09592, pruned_loss=0.01551, audio_tagging_loss=0.01236, over 15546.00 frames. ], tot_loss[loss=0.09498, simple_loss=0.1092, pruned_loss=0.02492, audio_tagging_loss=0.01544, over 1937550.21 frames. ], batch size: 57, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:05:00,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=642573.3333333334, ans=0.2 2023-11-19 08:05:07,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=642573.3333333334, ans=0.2 2023-11-19 08:05:11,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=642640.0, ans=0.125 2023-11-19 08:05:11,611 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-19 08:05:13,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.507e+01 8.518e+01 9.330e+01 1.026e+02 1.321e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 08:05:13,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=642640.0, ans=0.125 2023-11-19 08:05:21,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2023-11-19 08:05:24,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=642706.6666666666, ans=0.0 2023-11-19 08:05:45,004 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=642840.0, ans=0.0 2023-11-19 08:05:50,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=642840.0, ans=0.0 2023-11-19 08:05:50,726 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.07 vs. limit=22.5 2023-11-19 08:05:52,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=642840.0, ans=0.2 2023-11-19 08:05:54,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=642906.6666666666, ans=0.0 2023-11-19 08:05:55,838 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 250, loss[loss=0.1034, simple_loss=0.1256, pruned_loss=0.03136, audio_tagging_loss=0.009227, over 15192.00 frames. ], tot_loss[loss=0.0935, simple_loss=0.1092, pruned_loss=0.02499, audio_tagging_loss=0.01393, over 2188788.15 frames. ], batch size: 56, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:06:04,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642906.6666666666, ans=0.1 2023-11-19 08:06:05,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=642973.3333333334, ans=0.035 2023-11-19 08:06:23,203 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=643040.0, ans=0.2 2023-11-19 08:06:23,271 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=643040.0, ans=0.125 2023-11-19 08:06:25,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643040.0, ans=0.1 2023-11-19 08:06:26,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=643040.0, ans=0.125 2023-11-19 08:06:45,088 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=643173.3333333334, ans=0.0 2023-11-19 08:06:51,156 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 300, loss[loss=0.07438, simple_loss=0.08776, pruned_loss=0.01951, audio_tagging_loss=0.01099, over 14608.00 frames. ], tot_loss[loss=0.09239, simple_loss=0.109, pruned_loss=0.02498, audio_tagging_loss=0.01293, over 2377810.00 frames. ], batch size: 55, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:06:53,471 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=643240.0, ans=0.125 2023-11-19 08:06:59,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=643240.0, ans=0.125 2023-11-19 08:07:05,327 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.313e+01 8.625e+01 9.241e+01 1.032e+02 1.343e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 08:07:05,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=643306.6666666666, ans=0.2 2023-11-19 08:07:07,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=643306.6666666666, ans=0.09899494936611666 2023-11-19 08:07:21,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=643373.3333333334, ans=0.2 2023-11-19 08:07:22,875 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=643373.3333333334, ans=0.05 2023-11-19 08:07:23,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=643373.3333333334, ans=0.125 2023-11-19 08:07:34,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=643506.6666666666, ans=0.0 2023-11-19 08:07:45,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=643506.6666666666, ans=0.2 2023-11-19 08:07:47,537 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 350, loss[loss=0.09095, simple_loss=0.1204, pruned_loss=0.02057, audio_tagging_loss=0.01016, over 15037.00 frames. ], tot_loss[loss=0.09233, simple_loss=0.1098, pruned_loss=0.02514, audio_tagging_loss=0.01231, over 2528467.69 frames. ], batch size: 56, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:07:51,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=643573.3333333334, ans=0.125 2023-11-19 08:08:06,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=643640.0, ans=0.125 2023-11-19 08:08:19,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=643773.3333333334, ans=0.1 2023-11-19 08:08:42,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=643906.6666666666, ans=0.125 2023-11-19 08:08:43,342 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 400, loss[loss=0.09612, simple_loss=0.1122, pruned_loss=0.02753, audio_tagging_loss=0.01249, over 14910.00 frames. ], tot_loss[loss=0.09271, simple_loss=0.1109, pruned_loss=0.02551, audio_tagging_loss=0.01174, over 2648351.13 frames. ], batch size: 56, lr: 7.93e-03, grad_scale: 32.0 2023-11-19 08:08:52,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=643906.6666666666, ans=0.125 2023-11-19 08:08:55,931 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.363e+01 9.025e+01 9.871e+01 1.227e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 08:09:05,784 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2023-11-19 08:09:09,274 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=644040.0, ans=0.0 2023-11-19 08:09:32,827 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-19 08:09:35,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=644173.3333333334, ans=0.0 2023-11-19 08:09:35,775 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:09:36,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=644173.3333333334, ans=0.0 2023-11-19 08:09:36,818 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644173.3333333334, ans=0.1 2023-11-19 08:09:38,858 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 450, loss[loss=0.08203, simple_loss=0.1091, pruned_loss=0.01911, audio_tagging_loss=0.00838, over 15806.00 frames. ], tot_loss[loss=0.09174, simple_loss=0.1099, pruned_loss=0.02534, audio_tagging_loss=0.01147, over 2734654.21 frames. ], batch size: 57, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:09:40,257 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=644240.0, ans=0.0 2023-11-19 08:09:41,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644240.0, ans=0.1 2023-11-19 08:09:55,938 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2023-11-19 08:09:57,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=644306.6666666666, ans=0.0 2023-11-19 08:09:58,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=644306.6666666666, ans=0.1 2023-11-19 08:10:03,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=644373.3333333334, ans=0.125 2023-11-19 08:10:22,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=644506.6666666666, ans=0.125 2023-11-19 08:10:23,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=644506.6666666666, ans=0.125 2023-11-19 08:10:25,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=644506.6666666666, ans=0.125 2023-11-19 08:10:35,243 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 500, loss[loss=0.08235, simple_loss=0.1022, pruned_loss=0.01887, audio_tagging_loss=0.01238, over 14671.00 frames. ], tot_loss[loss=0.09116, simple_loss=0.1095, pruned_loss=0.02523, audio_tagging_loss=0.01116, over 2801945.70 frames. ], batch size: 53, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:10:40,918 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.32 vs. limit=10.0 2023-11-19 08:10:45,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=644640.0, ans=0.2 2023-11-19 08:10:48,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.872e+01 8.533e+01 9.443e+01 1.042e+02 1.372e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-19 08:10:54,767 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=12.0 2023-11-19 08:11:05,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2023-11-19 08:11:08,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=22.5 2023-11-19 08:11:12,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=644773.3333333334, ans=0.0 2023-11-19 08:11:25,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=644840.0, ans=0.1 2023-11-19 08:11:31,022 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 550, loss[loss=0.08235, simple_loss=0.1003, pruned_loss=0.0196, audio_tagging_loss=0.01261, over 14514.00 frames. ], tot_loss[loss=0.09014, simple_loss=0.1084, pruned_loss=0.02491, audio_tagging_loss=0.01103, over 2862486.12 frames. ], batch size: 55, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:11:38,142 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:11:58,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=645040.0, ans=0.1 2023-11-19 08:12:04,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=645106.6666666666, ans=0.0 2023-11-19 08:12:10,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=645106.6666666666, ans=0.125 2023-11-19 08:12:23,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=645173.3333333334, ans=0.025 2023-11-19 08:12:26,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.44 vs. limit=10.0 2023-11-19 08:12:26,848 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 600, loss[loss=0.08762, simple_loss=0.1052, pruned_loss=0.02535, audio_tagging_loss=0.009661, over 15088.00 frames. ], tot_loss[loss=0.09073, simple_loss=0.1094, pruned_loss=0.02513, audio_tagging_loss=0.0109, over 2905857.31 frames. ], batch size: 58, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:12:37,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=645306.6666666666, ans=0.0 2023-11-19 08:12:40,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.132e+01 8.342e+01 9.026e+01 9.768e+01 1.504e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 08:12:43,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-11-19 08:12:54,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-19 08:12:58,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=645373.3333333334, ans=0.0 2023-11-19 08:12:58,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=12.0 2023-11-19 08:13:00,192 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=645440.0, ans=0.125 2023-11-19 08:13:01,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=645440.0, ans=0.2 2023-11-19 08:13:09,810 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=645440.0, ans=0.125 2023-11-19 08:13:10,117 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-19 08:13:11,460 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-19 08:13:20,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=645506.6666666666, ans=0.1 2023-11-19 08:13:22,986 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 650, loss[loss=0.08922, simple_loss=0.11, pruned_loss=0.02265, audio_tagging_loss=0.01158, over 14286.00 frames. ], tot_loss[loss=0.08992, simple_loss=0.1082, pruned_loss=0.02489, audio_tagging_loss=0.01092, over 2937775.14 frames. ], batch size: 54, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:14:16,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=645840.0, ans=0.125 2023-11-19 08:14:19,615 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 700, loss[loss=0.09274, simple_loss=0.1057, pruned_loss=0.02873, audio_tagging_loss=0.01118, over 15272.00 frames. ], tot_loss[loss=0.09004, simple_loss=0.1088, pruned_loss=0.02488, audio_tagging_loss=0.01078, over 2961179.10 frames. ], batch size: 57, lr: 7.91e-03, grad_scale: 16.0 2023-11-19 08:14:33,922 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.287e+01 8.978e+01 1.006e+02 1.254e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 08:14:40,596 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646040.0, ans=0.1 2023-11-19 08:14:44,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=646040.0, ans=0.1 2023-11-19 08:14:50,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=646040.0, ans=0.125 2023-11-19 08:15:12,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.38 vs. limit=10.0 2023-11-19 08:15:15,502 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 750, loss[loss=0.09492, simple_loss=0.1131, pruned_loss=0.02878, audio_tagging_loss=0.009585, over 16519.00 frames. ], tot_loss[loss=0.09042, simple_loss=0.1092, pruned_loss=0.02505, audio_tagging_loss=0.01075, over 2986358.68 frames. ], batch size: 59, lr: 7.91e-03, grad_scale: 16.0 2023-11-19 08:15:21,068 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=646240.0, ans=0.0 2023-11-19 08:15:23,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=646240.0, ans=0.2 2023-11-19 08:15:32,692 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=646306.6666666666, ans=0.1 2023-11-19 08:15:36,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=12.0 2023-11-19 08:16:08,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=646506.6666666666, ans=0.125 2023-11-19 08:16:09,818 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2023-11-19 08:16:11,321 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 800, loss[loss=0.08339, simple_loss=0.1076, pruned_loss=0.0212, audio_tagging_loss=0.008376, over 16523.00 frames. ], tot_loss[loss=0.08986, simple_loss=0.1086, pruned_loss=0.02477, audio_tagging_loss=0.01079, over 3002514.01 frames. ], batch size: 63, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:16:25,539 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.571e+01 9.363e+01 1.050e+02 1.472e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 08:16:25,797 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=646640.0, ans=0.0 2023-11-19 08:17:05,149 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=646840.0, ans=0.125 2023-11-19 08:17:07,117 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 850, loss[loss=0.08684, simple_loss=0.09996, pruned_loss=0.02573, audio_tagging_loss=0.01113, over 15260.00 frames. ], tot_loss[loss=0.08967, simple_loss=0.1081, pruned_loss=0.02474, audio_tagging_loss=0.01088, over 3010759.64 frames. ], batch size: 57, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:17:19,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=646973.3333333334, ans=0.0 2023-11-19 08:17:20,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646973.3333333334, ans=0.1 2023-11-19 08:17:33,655 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=647040.0, ans=0.1 2023-11-19 08:17:42,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=647106.6666666666, ans=0.125 2023-11-19 08:18:02,488 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 900, loss[loss=0.07818, simple_loss=0.1011, pruned_loss=0.01877, audio_tagging_loss=0.008869, over 15814.00 frames. ], tot_loss[loss=0.08993, simple_loss=0.1084, pruned_loss=0.02474, audio_tagging_loss=0.01097, over 3023486.29 frames. ], batch size: 58, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:18:16,857 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.327e+01 8.077e+01 9.345e+01 1.007e+02 1.276e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-19 08:18:25,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=647373.3333333334, ans=0.04949747468305833 2023-11-19 08:18:26,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=647373.3333333334, ans=0.125 2023-11-19 08:18:31,937 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-19 08:18:37,775 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-19 08:18:49,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=647506.6666666666, ans=0.0 2023-11-19 08:18:58,296 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 950, loss[loss=0.09733, simple_loss=0.1215, pruned_loss=0.02543, audio_tagging_loss=0.01114, over 15528.00 frames. ], tot_loss[loss=0.08924, simple_loss=0.1076, pruned_loss=0.02453, audio_tagging_loss=0.01088, over 3028276.80 frames. ], batch size: 57, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:19:08,597 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=647640.0, ans=0.125 2023-11-19 08:19:09,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=647640.0, ans=0.0 2023-11-19 08:19:11,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=647640.0, ans=0.2 2023-11-19 08:19:11,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=647640.0, ans=0.125 2023-11-19 08:19:15,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=647640.0, ans=0.125 2023-11-19 08:19:21,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647706.6666666666, ans=0.1 2023-11-19 08:19:28,629 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647706.6666666666, ans=0.1 2023-11-19 08:19:44,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=647840.0, ans=0.0 2023-11-19 08:19:53,959 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1000, loss[loss=0.1029, simple_loss=0.1337, pruned_loss=0.02972, audio_tagging_loss=0.006336, over 14714.00 frames. ], tot_loss[loss=0.08897, simple_loss=0.1076, pruned_loss=0.02442, audio_tagging_loss=0.01074, over 3030897.02 frames. ], batch size: 54, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:20:08,926 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.007e+01 8.931e+01 9.562e+01 1.265e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-19 08:20:17,775 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:20:27,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=648106.6666666666, ans=0.125 2023-11-19 08:20:48,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=648173.3333333334, ans=0.125 2023-11-19 08:20:50,068 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1050, loss[loss=0.08448, simple_loss=0.09634, pruned_loss=0.02532, audio_tagging_loss=0.01099, over 14501.00 frames. ], tot_loss[loss=0.08938, simple_loss=0.1082, pruned_loss=0.02471, audio_tagging_loss=0.01055, over 3031828.55 frames. ], batch size: 53, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:21:10,086 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.80 vs. limit=22.5 2023-11-19 08:21:12,706 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2023-11-19 08:21:13,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.12 vs. limit=15.0 2023-11-19 08:21:22,977 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.07 vs. limit=22.5 2023-11-19 08:21:24,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=648440.0, ans=0.125 2023-11-19 08:21:28,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.49 vs. limit=12.0 2023-11-19 08:21:35,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=648506.6666666666, ans=0.125 2023-11-19 08:21:40,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=648506.6666666666, ans=0.2 2023-11-19 08:21:46,132 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1100, loss[loss=0.07968, simple_loss=0.09443, pruned_loss=0.02099, audio_tagging_loss=0.01147, over 16388.00 frames. ], tot_loss[loss=0.08963, simple_loss=0.1085, pruned_loss=0.02493, audio_tagging_loss=0.01047, over 3041147.01 frames. ], batch size: 63, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:21:46,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=648573.3333333334, ans=0.125 2023-11-19 08:21:48,231 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:21:55,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=648573.3333333334, ans=0.1 2023-11-19 08:21:55,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=648573.3333333334, ans=0.09899494936611666 2023-11-19 08:22:00,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.479e+01 9.483e+01 1.065e+02 1.916e+02, threshold=1.897e+02, percent-clipped=1.0 2023-11-19 08:22:19,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-19 08:22:41,991 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1150, loss[loss=0.09145, simple_loss=0.1084, pruned_loss=0.02875, audio_tagging_loss=0.008489, over 14172.00 frames. ], tot_loss[loss=0.08926, simple_loss=0.1081, pruned_loss=0.02475, audio_tagging_loss=0.01044, over 3037370.48 frames. ], batch size: 55, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:22:49,558 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=648906.6666666666, ans=0.125 2023-11-19 08:23:04,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649040.0, ans=0.1 2023-11-19 08:23:11,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=649040.0, ans=0.0 2023-11-19 08:23:18,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=649106.6666666666, ans=0.125 2023-11-19 08:23:30,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649173.3333333334, ans=0.1 2023-11-19 08:23:37,817 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1200, loss[loss=0.09517, simple_loss=0.119, pruned_loss=0.02681, audio_tagging_loss=0.008869, over 15933.00 frames. ], tot_loss[loss=0.08856, simple_loss=0.1075, pruned_loss=0.02439, audio_tagging_loss=0.01043, over 3035383.91 frames. ], batch size: 58, lr: 7.89e-03, grad_scale: 32.0 2023-11-19 08:23:52,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.405e+01 8.142e+01 8.954e+01 1.003e+02 1.503e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-19 08:23:53,899 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=649306.6666666666, ans=0.0 2023-11-19 08:23:55,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=649306.6666666666, ans=0.2 2023-11-19 08:24:19,591 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2023-11-19 08:24:24,787 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649506.6666666666, ans=0.1 2023-11-19 08:24:33,447 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1250, loss[loss=0.06431, simple_loss=0.07247, pruned_loss=0.01557, audio_tagging_loss=0.0125, over 14853.00 frames. ], tot_loss[loss=0.08839, simple_loss=0.1069, pruned_loss=0.02442, audio_tagging_loss=0.01052, over 3039774.62 frames. ], batch size: 57, lr: 7.89e-03, grad_scale: 32.0 2023-11-19 08:24:38,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=649573.3333333334, ans=0.125 2023-11-19 08:24:50,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=649640.0, ans=0.0 2023-11-19 08:25:20,817 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=649840.0, ans=0.2 2023-11-19 08:25:22,955 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=649840.0, ans=0.125 2023-11-19 08:25:29,553 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1300, loss[loss=0.09708, simple_loss=0.1262, pruned_loss=0.02461, audio_tagging_loss=0.00937, over 14682.00 frames. ], tot_loss[loss=0.08823, simple_loss=0.107, pruned_loss=0.02422, audio_tagging_loss=0.01053, over 3044151.95 frames. ], batch size: 54, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:25:41,806 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=649973.3333333334, ans=0.0 2023-11-19 08:25:43,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=649973.3333333334, ans=0.2 2023-11-19 08:25:44,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.314e+01 8.988e+01 1.010e+02 1.259e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 08:25:56,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=650040.0, ans=0.125 2023-11-19 08:26:16,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=650173.3333333334, ans=0.125 2023-11-19 08:26:19,813 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=650173.3333333334, ans=0.125 2023-11-19 08:26:20,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.33 vs. limit=10.0 2023-11-19 08:26:25,336 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1350, loss[loss=0.07347, simple_loss=0.08784, pruned_loss=0.01843, audio_tagging_loss=0.01112, over 14410.00 frames. ], tot_loss[loss=0.08802, simple_loss=0.1066, pruned_loss=0.0242, audio_tagging_loss=0.01051, over 3043947.96 frames. ], batch size: 55, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:26:35,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=650306.6666666666, ans=0.125 2023-11-19 08:26:50,541 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=650373.3333333334, ans=0.0 2023-11-19 08:26:55,185 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2023-11-19 08:26:59,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650440.0, ans=0.1 2023-11-19 08:27:00,145 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:27:02,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=650440.0, ans=0.09899494936611666 2023-11-19 08:27:05,210 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:27:13,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=650506.6666666666, ans=0.0 2023-11-19 08:27:20,252 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1400, loss[loss=0.1142, simple_loss=0.1406, pruned_loss=0.03727, audio_tagging_loss=0.006689, over 15547.00 frames. ], tot_loss[loss=0.08804, simple_loss=0.1061, pruned_loss=0.02421, audio_tagging_loss=0.01076, over 3041099.86 frames. ], batch size: 56, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:27:22,005 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-19 08:27:22,060 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2023-11-19 08:27:32,627 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-11-19 08:27:36,590 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.323e+01 8.984e+01 9.924e+01 1.651e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 08:27:38,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650640.0, ans=0.1 2023-11-19 08:27:41,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=650640.0, ans=0.125 2023-11-19 08:27:51,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=650706.6666666666, ans=0.0 2023-11-19 08:28:14,301 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2023-11-19 08:28:17,049 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1450, loss[loss=0.07731, simple_loss=0.09253, pruned_loss=0.01987, audio_tagging_loss=0.01117, over 16173.00 frames. ], tot_loss[loss=0.08829, simple_loss=0.1066, pruned_loss=0.02422, audio_tagging_loss=0.01079, over 3041435.48 frames. ], batch size: 63, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:28:37,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=650973.3333333334, ans=0.125 2023-11-19 08:28:39,225 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:28:40,279 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=651040.0, ans=0.125 2023-11-19 08:28:41,342 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:28:44,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=651040.0, ans=0.0 2023-11-19 08:29:00,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=651173.3333333334, ans=0.09899494936611666 2023-11-19 08:29:12,416 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1500, loss[loss=0.09264, simple_loss=0.1143, pruned_loss=0.02391, audio_tagging_loss=0.01158, over 16206.00 frames. ], tot_loss[loss=0.08899, simple_loss=0.1074, pruned_loss=0.02446, audio_tagging_loss=0.01083, over 3041585.28 frames. ], batch size: 62, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:29:15,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=651240.0, ans=0.125 2023-11-19 08:29:17,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651240.0, ans=0.1 2023-11-19 08:29:23,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=651306.6666666666, ans=0.0 2023-11-19 08:29:27,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.592e+01 9.437e+01 1.054e+02 1.547e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 08:29:29,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=651306.6666666666, ans=0.125 2023-11-19 08:29:33,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=651373.3333333334, ans=10.0 2023-11-19 08:29:42,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=651373.3333333334, ans=0.125 2023-11-19 08:29:57,826 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:30:08,177 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1550, loss[loss=0.1137, simple_loss=0.1458, pruned_loss=0.03147, audio_tagging_loss=0.009311, over 14973.00 frames. ], tot_loss[loss=0.08936, simple_loss=0.1078, pruned_loss=0.02462, audio_tagging_loss=0.01083, over 3041748.31 frames. ], batch size: 56, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:30:13,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=651573.3333333334, ans=0.2 2023-11-19 08:30:19,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=651640.0, ans=0.2 2023-11-19 08:30:28,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=651640.0, ans=0.0 2023-11-19 08:30:47,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=651773.3333333334, ans=15.0 2023-11-19 08:30:48,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=651773.3333333334, ans=0.125 2023-11-19 08:30:48,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=651773.3333333334, ans=0.125 2023-11-19 08:30:52,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=651840.0, ans=0.125 2023-11-19 08:31:04,997 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1600, loss[loss=0.06574, simple_loss=0.07919, pruned_loss=0.01587, audio_tagging_loss=0.01027, over 14492.00 frames. ], tot_loss[loss=0.08918, simple_loss=0.1077, pruned_loss=0.02453, audio_tagging_loss=0.01079, over 3036703.22 frames. ], batch size: 57, lr: 7.88e-03, grad_scale: 32.0 2023-11-19 08:31:13,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=651906.6666666666, ans=0.0 2023-11-19 08:31:20,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 8.197e+01 8.883e+01 9.741e+01 1.380e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 08:31:25,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=651973.3333333334, ans=0.0 2023-11-19 08:31:44,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=652106.6666666666, ans=0.2 2023-11-19 08:31:52,414 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=652173.3333333334, ans=0.0 2023-11-19 08:32:00,809 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1650, loss[loss=0.08875, simple_loss=0.1107, pruned_loss=0.02107, audio_tagging_loss=0.01236, over 14838.00 frames. ], tot_loss[loss=0.08848, simple_loss=0.1065, pruned_loss=0.02433, audio_tagging_loss=0.01091, over 3034577.85 frames. ], batch size: 55, lr: 7.88e-03, grad_scale: 32.0 2023-11-19 08:32:01,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=652240.0, ans=0.1 2023-11-19 08:32:16,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=652306.6666666666, ans=0.1 2023-11-19 08:32:31,415 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-19 08:32:39,745 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=12.0 2023-11-19 08:32:52,514 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:32:56,431 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1700, loss[loss=0.09189, simple_loss=0.1167, pruned_loss=0.02648, audio_tagging_loss=0.007082, over 14601.00 frames. ], tot_loss[loss=0.08791, simple_loss=0.1056, pruned_loss=0.02415, audio_tagging_loss=0.01094, over 3032415.29 frames. ], batch size: 58, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:32:58,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=652573.3333333334, ans=0.1 2023-11-19 08:33:03,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=652573.3333333334, ans=0.0 2023-11-19 08:33:08,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=652640.0, ans=0.125 2023-11-19 08:33:13,290 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.601e+01 9.234e+01 1.036e+02 1.381e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:33:36,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=652773.3333333334, ans=0.125 2023-11-19 08:33:40,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=652840.0, ans=0.0 2023-11-19 08:33:52,660 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1750, loss[loss=0.08824, simple_loss=0.1018, pruned_loss=0.02552, audio_tagging_loss=0.01183, over 15108.00 frames. ], tot_loss[loss=0.08813, simple_loss=0.106, pruned_loss=0.02427, audio_tagging_loss=0.01088, over 3036034.45 frames. ], batch size: 57, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:34:01,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=652906.6666666666, ans=0.07 2023-11-19 08:34:08,349 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:34:12,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=652973.3333333334, ans=0.125 2023-11-19 08:34:19,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=653040.0, ans=0.125 2023-11-19 08:34:41,194 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=12.0 2023-11-19 08:34:48,509 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1800, loss[loss=0.06892, simple_loss=0.07883, pruned_loss=0.01844, audio_tagging_loss=0.01106, over 15157.00 frames. ], tot_loss[loss=0.08707, simple_loss=0.1053, pruned_loss=0.0238, audio_tagging_loss=0.01064, over 3040440.64 frames. ], batch size: 58, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:35:00,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=653306.6666666666, ans=0.125 2023-11-19 08:35:05,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.385e+01 9.235e+01 9.785e+01 1.398e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:35:08,801 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=653306.6666666666, ans=0.125 2023-11-19 08:35:25,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2023-11-19 08:35:28,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=653440.0, ans=0.0 2023-11-19 08:35:32,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=653506.6666666666, ans=0.125 2023-11-19 08:35:44,401 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-11-19 08:35:44,909 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1850, loss[loss=0.05927, simple_loss=0.06147, pruned_loss=0.01435, audio_tagging_loss=0.01419, over 14093.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.1051, pruned_loss=0.02383, audio_tagging_loss=0.0106, over 3033218.94 frames. ], batch size: 54, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:35:53,960 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-19 08:36:00,967 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=653640.0, ans=0.0 2023-11-19 08:36:01,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=653640.0, ans=0.125 2023-11-19 08:36:03,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=653640.0, ans=0.0 2023-11-19 08:36:23,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=653773.3333333334, ans=0.95 2023-11-19 08:36:32,112 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:36:36,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=653840.0, ans=0.0 2023-11-19 08:36:38,187 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-11-19 08:36:38,297 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-19 08:36:38,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=653840.0, ans=0.1 2023-11-19 08:36:40,902 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1900, loss[loss=0.09372, simple_loss=0.1165, pruned_loss=0.0252, audio_tagging_loss=0.01028, over 14676.00 frames. ], tot_loss[loss=0.0873, simple_loss=0.1058, pruned_loss=0.02393, audio_tagging_loss=0.01047, over 3031527.96 frames. ], batch size: 54, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:36:46,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=653906.6666666666, ans=0.0 2023-11-19 08:36:47,216 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-19 08:36:48,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=653906.6666666666, ans=0.07 2023-11-19 08:36:57,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.285e+01 9.091e+01 1.015e+02 1.507e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 08:37:05,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=654040.0, ans=0.2 2023-11-19 08:37:18,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=654106.6666666666, ans=0.0 2023-11-19 08:37:23,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=654106.6666666666, ans=0.125 2023-11-19 08:37:28,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=654173.3333333334, ans=0.0 2023-11-19 08:37:36,682 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 1950, loss[loss=0.06903, simple_loss=0.08245, pruned_loss=0.01677, audio_tagging_loss=0.01104, over 14399.00 frames. ], tot_loss[loss=0.08792, simple_loss=0.1066, pruned_loss=0.02415, audio_tagging_loss=0.01046, over 3030751.54 frames. ], batch size: 56, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:37:54,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=654306.6666666666, ans=0.125 2023-11-19 08:38:07,110 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=654373.3333333334, ans=0.0 2023-11-19 08:38:14,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2023-11-19 08:38:30,397 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.03 vs. limit=15.0 2023-11-19 08:38:32,852 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2000, loss[loss=0.09679, simple_loss=0.119, pruned_loss=0.02926, audio_tagging_loss=0.008039, over 15558.00 frames. ], tot_loss[loss=0.08683, simple_loss=0.105, pruned_loss=0.02378, audio_tagging_loss=0.01054, over 3029106.09 frames. ], batch size: 58, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:38:41,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=654573.3333333334, ans=0.125 2023-11-19 08:38:51,180 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 8.516e+01 9.233e+01 1.009e+02 1.531e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:39:03,034 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=654706.6666666666, ans=0.0 2023-11-19 08:39:20,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=654840.0, ans=0.125 2023-11-19 08:39:26,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=654840.0, ans=0.125 2023-11-19 08:39:28,857 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2050, loss[loss=0.1019, simple_loss=0.1177, pruned_loss=0.03098, audio_tagging_loss=0.01209, over 15205.00 frames. ], tot_loss[loss=0.0878, simple_loss=0.1061, pruned_loss=0.02427, audio_tagging_loss=0.0105, over 3035935.61 frames. ], batch size: 56, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:39:38,452 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-11-19 08:39:47,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=654973.3333333334, ans=0.125 2023-11-19 08:39:58,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.37 vs. limit=22.5 2023-11-19 08:40:12,026 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.880e-01 2023-11-19 08:40:25,107 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2100, loss[loss=0.08272, simple_loss=0.1043, pruned_loss=0.02089, audio_tagging_loss=0.009692, over 15754.00 frames. ], tot_loss[loss=0.08772, simple_loss=0.1061, pruned_loss=0.02421, audio_tagging_loss=0.01046, over 3039349.41 frames. ], batch size: 58, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:40:42,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.322e+01 9.112e+01 9.913e+01 1.952e+02, threshold=1.822e+02, percent-clipped=1.0 2023-11-19 08:41:16,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=655506.6666666666, ans=0.0 2023-11-19 08:41:20,757 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2150, loss[loss=0.08872, simple_loss=0.1141, pruned_loss=0.02273, audio_tagging_loss=0.00893, over 14933.00 frames. ], tot_loss[loss=0.08838, simple_loss=0.1071, pruned_loss=0.02437, audio_tagging_loss=0.01044, over 3039067.70 frames. ], batch size: 53, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:41:24,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=655573.3333333334, ans=0.0 2023-11-19 08:41:29,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=655573.3333333334, ans=15.0 2023-11-19 08:41:47,589 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=655706.6666666666, ans=0.125 2023-11-19 08:41:48,744 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=655706.6666666666, ans=0.125 2023-11-19 08:41:52,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=655773.3333333334, ans=0.125 2023-11-19 08:41:53,796 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:42:02,781 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2023-11-19 08:42:03,569 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=655773.3333333334, ans=0.02 2023-11-19 08:42:13,308 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=655840.0, ans=0.125 2023-11-19 08:42:16,773 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2200, loss[loss=0.1149, simple_loss=0.1469, pruned_loss=0.03223, audio_tagging_loss=0.009205, over 15895.00 frames. ], tot_loss[loss=0.08873, simple_loss=0.1075, pruned_loss=0.02454, audio_tagging_loss=0.01045, over 3041501.48 frames. ], batch size: 55, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:42:34,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.336e+01 8.931e+01 9.747e+01 1.930e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-19 08:42:36,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=655973.3333333334, ans=0.0 2023-11-19 08:42:38,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=656040.0, ans=0.125 2023-11-19 08:42:40,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=656040.0, ans=0.125 2023-11-19 08:42:57,898 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:43:12,648 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2250, loss[loss=0.09199, simple_loss=0.1125, pruned_loss=0.02476, audio_tagging_loss=0.01099, over 15808.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.1068, pruned_loss=0.02409, audio_tagging_loss=0.01046, over 3033639.32 frames. ], batch size: 58, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:43:18,404 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.16 vs. limit=15.0 2023-11-19 08:43:49,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2023-11-19 08:43:54,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.40 vs. limit=22.5 2023-11-19 08:44:00,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=656506.6666666666, ans=0.125 2023-11-19 08:44:01,255 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2023-11-19 08:44:07,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=656573.3333333334, ans=0.125 2023-11-19 08:44:08,702 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2300, loss[loss=0.09929, simple_loss=0.1252, pruned_loss=0.02409, audio_tagging_loss=0.01261, over 15339.00 frames. ], tot_loss[loss=0.08743, simple_loss=0.1059, pruned_loss=0.02388, audio_tagging_loss=0.01061, over 3036572.82 frames. ], batch size: 55, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:44:08,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=656573.3333333334, ans=0.125 2023-11-19 08:44:26,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.387e+01 9.079e+01 9.954e+01 1.397e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 08:44:30,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=656706.6666666666, ans=0.125 2023-11-19 08:44:50,700 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.34 vs. limit=15.0 2023-11-19 08:44:57,983 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:45:04,357 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2350, loss[loss=0.09419, simple_loss=0.118, pruned_loss=0.02409, audio_tagging_loss=0.01109, over 15801.00 frames. ], tot_loss[loss=0.08771, simple_loss=0.1064, pruned_loss=0.02396, audio_tagging_loss=0.01053, over 3042009.76 frames. ], batch size: 56, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:45:10,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=656906.6666666666, ans=0.125 2023-11-19 08:45:12,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=656906.6666666666, ans=0.2 2023-11-19 08:45:24,058 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-11-19 08:45:32,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=657040.0, ans=0.035 2023-11-19 08:45:35,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=657040.0, ans=0.125 2023-11-19 08:45:55,200 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2023-11-19 08:46:00,324 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2400, loss[loss=0.0757, simple_loss=0.08812, pruned_loss=0.01647, audio_tagging_loss=0.01517, over 14263.00 frames. ], tot_loss[loss=0.08824, simple_loss=0.107, pruned_loss=0.02409, audio_tagging_loss=0.01063, over 3042537.13 frames. ], batch size: 54, lr: 7.85e-03, grad_scale: 32.0 2023-11-19 08:46:00,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=657240.0, ans=15.0 2023-11-19 08:46:09,893 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=657306.6666666666, ans=0.09899494936611666 2023-11-19 08:46:10,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=657306.6666666666, ans=0.125 2023-11-19 08:46:16,280 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-19 08:46:17,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=657306.6666666666, ans=0.95 2023-11-19 08:46:17,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.346e+01 9.299e+01 9.977e+01 1.391e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 08:46:18,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=657306.6666666666, ans=0.0 2023-11-19 08:46:21,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=657373.3333333334, ans=0.125 2023-11-19 08:46:22,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=12.0 2023-11-19 08:46:25,478 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-11-19 08:46:28,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=657373.3333333334, ans=0.125 2023-11-19 08:46:29,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657373.3333333334, ans=0.1 2023-11-19 08:46:43,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.37 vs. limit=15.0 2023-11-19 08:46:51,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=657506.6666666666, ans=0.125 2023-11-19 08:46:53,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=657506.6666666666, ans=0.125 2023-11-19 08:46:56,227 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2450, loss[loss=0.09812, simple_loss=0.1232, pruned_loss=0.02793, audio_tagging_loss=0.008582, over 15544.00 frames. ], tot_loss[loss=0.08849, simple_loss=0.107, pruned_loss=0.0242, audio_tagging_loss=0.01081, over 3043549.38 frames. ], batch size: 56, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:46:56,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=657573.3333333334, ans=0.125 2023-11-19 08:47:29,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=657773.3333333334, ans=0.125 2023-11-19 08:47:33,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=657773.3333333334, ans=0.0 2023-11-19 08:47:40,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=657840.0, ans=0.09899494936611666 2023-11-19 08:47:51,715 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2500, loss[loss=0.08378, simple_loss=0.1039, pruned_loss=0.02275, audio_tagging_loss=0.009075, over 15452.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.106, pruned_loss=0.02407, audio_tagging_loss=0.01089, over 3042791.89 frames. ], batch size: 59, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:47:53,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2023-11-19 08:48:01,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=657973.3333333334, ans=0.125 2023-11-19 08:48:06,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=657973.3333333334, ans=0.125 2023-11-19 08:48:09,790 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.559e+01 8.238e+01 9.130e+01 9.958e+01 1.355e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-19 08:48:26,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2023-11-19 08:48:41,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2023-11-19 08:48:48,242 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2550, loss[loss=0.08017, simple_loss=0.09518, pruned_loss=0.02203, audio_tagging_loss=0.01055, over 14714.00 frames. ], tot_loss[loss=0.0888, simple_loss=0.1071, pruned_loss=0.02448, audio_tagging_loss=0.01076, over 3041852.25 frames. ], batch size: 55, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:48:51,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=658240.0, ans=0.125 2023-11-19 08:48:58,231 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:49:03,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=658306.6666666666, ans=0.0 2023-11-19 08:49:13,874 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=22.5 2023-11-19 08:49:24,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=658440.0, ans=0.125 2023-11-19 08:49:25,898 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2023-11-19 08:49:28,538 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.55 vs. limit=22.5 2023-11-19 08:49:32,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=658506.6666666666, ans=0.125 2023-11-19 08:49:43,987 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2600, loss[loss=0.05262, simple_loss=0.062, pruned_loss=0.01032, audio_tagging_loss=0.0113, over 15386.00 frames. ], tot_loss[loss=0.08829, simple_loss=0.1065, pruned_loss=0.02441, audio_tagging_loss=0.01063, over 3048104.36 frames. ], batch size: 60, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:49:53,425 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2023-11-19 08:49:56,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=658640.0, ans=0.125 2023-11-19 08:50:01,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.704e+01 9.558e+01 1.082e+02 2.151e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-19 08:50:31,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=658840.0, ans=0.0 2023-11-19 08:50:38,761 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.61 vs. limit=15.0 2023-11-19 08:50:40,080 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2650, loss[loss=0.09126, simple_loss=0.104, pruned_loss=0.02792, audio_tagging_loss=0.01136, over 15127.00 frames. ], tot_loss[loss=0.08838, simple_loss=0.1068, pruned_loss=0.02441, audio_tagging_loss=0.01056, over 3049533.94 frames. ], batch size: 56, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:50:40,778 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-19 08:50:57,111 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.37 vs. limit=15.0 2023-11-19 08:51:11,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=659040.0, ans=0.2 2023-11-19 08:51:14,684 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=12.0 2023-11-19 08:51:29,168 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.25 vs. limit=15.0 2023-11-19 08:51:36,943 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2700, loss[loss=0.1012, simple_loss=0.1162, pruned_loss=0.03021, audio_tagging_loss=0.01292, over 16226.00 frames. ], tot_loss[loss=0.08865, simple_loss=0.107, pruned_loss=0.02468, audio_tagging_loss=0.01047, over 3046255.10 frames. ], batch size: 62, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:51:54,344 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 8.620e+01 9.365e+01 1.021e+02 1.535e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 08:51:54,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=659306.6666666666, ans=0.0 2023-11-19 08:52:04,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=659373.3333333334, ans=0.0 2023-11-19 08:52:16,798 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=659440.0, ans=0.125 2023-11-19 08:52:24,778 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=659506.6666666666, ans=0.2 2023-11-19 08:52:31,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=659573.3333333334, ans=0.125 2023-11-19 08:52:31,919 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2750, loss[loss=0.09916, simple_loss=0.1225, pruned_loss=0.02732, audio_tagging_loss=0.01057, over 15751.00 frames. ], tot_loss[loss=0.08886, simple_loss=0.1073, pruned_loss=0.02477, audio_tagging_loss=0.01043, over 3044224.72 frames. ], batch size: 57, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:53:06,536 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=659773.3333333334, ans=0.125 2023-11-19 08:53:11,859 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=659773.3333333334, ans=0.0 2023-11-19 08:53:18,438 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:53:26,880 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2800, loss[loss=0.07634, simple_loss=0.09128, pruned_loss=0.02027, audio_tagging_loss=0.01043, over 14576.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1067, pruned_loss=0.02444, audio_tagging_loss=0.01034, over 3052686.81 frames. ], batch size: 55, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:53:28,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=659906.6666666666, ans=0.0 2023-11-19 08:53:45,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.203e+01 8.899e+01 9.500e+01 1.297e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 08:53:50,264 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.95 vs. limit=10.0 2023-11-19 08:53:57,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=660040.0, ans=0.0 2023-11-19 08:53:57,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=660040.0, ans=0.125 2023-11-19 08:54:01,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=660106.6666666666, ans=0.125 2023-11-19 08:54:04,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660106.6666666666, ans=0.1 2023-11-19 08:54:10,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660173.3333333334, ans=0.1 2023-11-19 08:54:16,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=660173.3333333334, ans=0.0 2023-11-19 08:54:22,320 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2850, loss[loss=0.08558, simple_loss=0.095, pruned_loss=0.02698, audio_tagging_loss=0.0111, over 14937.00 frames. ], tot_loss[loss=0.08839, simple_loss=0.1071, pruned_loss=0.02449, audio_tagging_loss=0.01037, over 3049210.32 frames. ], batch size: 57, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:54:30,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=660240.0, ans=0.2 2023-11-19 08:54:52,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=660373.3333333334, ans=0.125 2023-11-19 08:54:58,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=660440.0, ans=0.0 2023-11-19 08:55:18,673 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2900, loss[loss=0.07949, simple_loss=0.0974, pruned_loss=0.02141, audio_tagging_loss=0.009376, over 15374.00 frames. ], tot_loss[loss=0.08818, simple_loss=0.107, pruned_loss=0.02432, audio_tagging_loss=0.01036, over 3046610.71 frames. ], batch size: 57, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:55:27,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=660573.3333333334, ans=0.0 2023-11-19 08:55:28,370 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-19 08:55:36,370 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.534e+01 9.333e+01 1.020e+02 1.874e+02, threshold=1.867e+02, percent-clipped=1.0 2023-11-19 08:56:00,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=660773.3333333334, ans=0.125 2023-11-19 08:56:14,632 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 2950, loss[loss=0.09875, simple_loss=0.1217, pruned_loss=0.02596, audio_tagging_loss=0.01192, over 16524.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.1074, pruned_loss=0.02421, audio_tagging_loss=0.01038, over 3046077.37 frames. ], batch size: 60, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:56:27,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=660973.3333333334, ans=0.125 2023-11-19 08:56:28,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=660973.3333333334, ans=0.125 2023-11-19 08:56:30,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=660973.3333333334, ans=0.0 2023-11-19 08:56:33,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=22.5 2023-11-19 08:56:51,269 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=661106.6666666666, ans=0.2 2023-11-19 08:56:55,790 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2023-11-19 08:57:10,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=661240.0, ans=0.125 2023-11-19 08:57:10,810 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3000, loss[loss=0.06676, simple_loss=0.07341, pruned_loss=0.01609, audio_tagging_loss=0.01397, over 16801.00 frames. ], tot_loss[loss=0.08855, simple_loss=0.1075, pruned_loss=0.02428, audio_tagging_loss=0.01051, over 3051302.88 frames. ], batch size: 65, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:57:10,811 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 08:57:27,169 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0382, 3.4329, 3.9752, 3.6687], device='cuda:3') 2023-11-19 08:57:44,067 INFO [train_asr.py:1147] (3/4) Epoch 9, validation: loss=0.06604, simple_loss=0.05618, pruned_loss=0.006775, audio_tagging_loss=0.03117, over 4681554.00 frames. 2023-11-19 08:57:44,068 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 08:57:45,246 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661240.0, ans=0.1 2023-11-19 08:57:48,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=661240.0, ans=0.0 2023-11-19 08:57:59,302 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=661306.6666666666, ans=0.0 2023-11-19 08:58:01,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.862e+01 9.645e+01 1.062e+02 1.575e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 08:58:05,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=661373.3333333334, ans=0.07 2023-11-19 08:58:15,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2023-11-19 08:58:21,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661440.0, ans=0.1 2023-11-19 08:58:30,057 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=661506.6666666666, ans=0.125 2023-11-19 08:58:30,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=661506.6666666666, ans=0.125 2023-11-19 08:58:34,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661506.6666666666, ans=0.1 2023-11-19 08:58:39,499 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3050, loss[loss=0.05889, simple_loss=0.0574, pruned_loss=0.01183, audio_tagging_loss=0.01836, over 15265.00 frames. ], tot_loss[loss=0.08885, simple_loss=0.108, pruned_loss=0.02425, audio_tagging_loss=0.01057, over 3052969.17 frames. ], batch size: 59, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:58:41,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=661573.3333333334, ans=0.0 2023-11-19 08:58:54,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2023-11-19 08:59:01,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=661706.6666666666, ans=0.0 2023-11-19 08:59:06,534 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.50 vs. limit=10.0 2023-11-19 08:59:11,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-11-19 08:59:12,951 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:59:28,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=661840.0, ans=0.2 2023-11-19 08:59:32,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661840.0, ans=0.1 2023-11-19 08:59:36,844 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3100, loss[loss=0.09221, simple_loss=0.1076, pruned_loss=0.02848, audio_tagging_loss=0.009923, over 15238.00 frames. ], tot_loss[loss=0.0889, simple_loss=0.1077, pruned_loss=0.02443, audio_tagging_loss=0.01062, over 3049271.17 frames. ], batch size: 61, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:59:37,390 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=12.0 2023-11-19 08:59:42,549 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.890e-02 2023-11-19 08:59:54,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.514e+01 9.258e+01 1.059e+02 1.772e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-19 09:00:01,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.58 vs. limit=22.5 2023-11-19 09:00:12,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=662106.6666666666, ans=0.0 2023-11-19 09:00:12,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662106.6666666666, ans=0.1 2023-11-19 09:00:29,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=662173.3333333334, ans=0.125 2023-11-19 09:00:32,265 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3150, loss[loss=0.1017, simple_loss=0.125, pruned_loss=0.0286, audio_tagging_loss=0.01058, over 15816.00 frames. ], tot_loss[loss=0.08919, simple_loss=0.1077, pruned_loss=0.02454, audio_tagging_loss=0.0108, over 3050852.10 frames. ], batch size: 58, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 09:00:32,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=662240.0, ans=0.125 2023-11-19 09:00:45,610 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.48 vs. limit=22.5 2023-11-19 09:00:46,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=662306.6666666666, ans=0.035 2023-11-19 09:01:09,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=662440.0, ans=0.125 2023-11-19 09:01:10,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=662440.0, ans=0.1 2023-11-19 09:01:18,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=662506.6666666666, ans=0.0 2023-11-19 09:01:28,537 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3200, loss[loss=0.07299, simple_loss=0.0827, pruned_loss=0.02086, audio_tagging_loss=0.01078, over 14340.00 frames. ], tot_loss[loss=0.08921, simple_loss=0.1076, pruned_loss=0.02456, audio_tagging_loss=0.01084, over 3056462.89 frames. ], batch size: 54, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 09:01:41,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=662640.0, ans=0.125 2023-11-19 09:01:42,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=662640.0, ans=0.125 2023-11-19 09:01:42,330 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2023-11-19 09:01:46,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.390e+01 8.354e+01 9.192e+01 1.025e+02 1.348e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 09:01:56,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662706.6666666666, ans=0.1 2023-11-19 09:02:11,743 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2023-11-19 09:02:16,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=662840.0, ans=0.0 2023-11-19 09:02:23,963 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=12.0 2023-11-19 09:02:24,476 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3250, loss[loss=0.0893, simple_loss=0.1027, pruned_loss=0.02603, audio_tagging_loss=0.01193, over 15991.00 frames. ], tot_loss[loss=0.08861, simple_loss=0.1068, pruned_loss=0.02422, audio_tagging_loss=0.011, over 3058319.90 frames. ], batch size: 59, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:02:28,906 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2023-11-19 09:03:03,412 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2023-11-19 09:03:20,475 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3300, loss[loss=0.09329, simple_loss=0.1047, pruned_loss=0.03027, audio_tagging_loss=0.01066, over 14667.00 frames. ], tot_loss[loss=0.08939, simple_loss=0.1079, pruned_loss=0.02444, audio_tagging_loss=0.01102, over 3058780.88 frames. ], batch size: 57, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:03:38,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.893e+01 9.711e+01 1.106e+02 1.510e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-19 09:03:46,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663373.3333333334, ans=0.1 2023-11-19 09:03:46,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=663373.3333333334, ans=0.0 2023-11-19 09:03:57,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=663440.0, ans=0.0 2023-11-19 09:03:58,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=663440.0, ans=0.125 2023-11-19 09:03:58,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=663440.0, ans=0.125 2023-11-19 09:04:10,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=663506.6666666666, ans=0.0 2023-11-19 09:04:15,338 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-19 09:04:16,815 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3350, loss[loss=0.1052, simple_loss=0.1298, pruned_loss=0.02927, audio_tagging_loss=0.01098, over 15738.00 frames. ], tot_loss[loss=0.08953, simple_loss=0.1083, pruned_loss=0.02464, audio_tagging_loss=0.01076, over 3057643.12 frames. ], batch size: 60, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:04:20,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=663573.3333333334, ans=0.125 2023-11-19 09:04:24,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-11-19 09:04:26,048 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=663573.3333333334, ans=0.125 2023-11-19 09:04:44,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=663706.6666666666, ans=0.125 2023-11-19 09:04:49,924 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2023-11-19 09:04:51,907 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2023-11-19 09:05:05,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=663840.0, ans=0.07 2023-11-19 09:05:06,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=663840.0, ans=0.125 2023-11-19 09:05:12,839 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3400, loss[loss=0.05798, simple_loss=0.07159, pruned_loss=0.01241, audio_tagging_loss=0.009783, over 15005.00 frames. ], tot_loss[loss=0.08928, simple_loss=0.1085, pruned_loss=0.02455, audio_tagging_loss=0.01049, over 3053182.46 frames. ], batch size: 58, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:05:13,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=663906.6666666666, ans=0.125 2023-11-19 09:05:18,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=663906.6666666666, ans=0.125 2023-11-19 09:05:30,637 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.652e+01 9.509e+01 1.042e+02 1.757e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 09:05:46,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=664106.6666666666, ans=0.5 2023-11-19 09:06:03,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=664173.3333333334, ans=0.125 2023-11-19 09:06:05,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=664173.3333333334, ans=0.125 2023-11-19 09:06:08,679 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3450, loss[loss=0.07961, simple_loss=0.08754, pruned_loss=0.02334, audio_tagging_loss=0.0125, over 15170.00 frames. ], tot_loss[loss=0.08911, simple_loss=0.108, pruned_loss=0.02464, audio_tagging_loss=0.01047, over 3054909.99 frames. ], batch size: 58, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:06:20,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=664306.6666666666, ans=0.125 2023-11-19 09:06:20,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=664306.6666666666, ans=0.0 2023-11-19 09:06:30,231 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=664373.3333333334, ans=0.0 2023-11-19 09:06:37,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=664373.3333333334, ans=0.125 2023-11-19 09:06:38,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=664373.3333333334, ans=0.125 2023-11-19 09:06:58,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=664506.6666666666, ans=0.0 2023-11-19 09:07:04,549 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3500, loss[loss=0.1177, simple_loss=0.1354, pruned_loss=0.04243, audio_tagging_loss=0.007546, over 15091.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1074, pruned_loss=0.02458, audio_tagging_loss=0.01035, over 3050660.52 frames. ], batch size: 53, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:07:22,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.336e+01 8.969e+01 9.703e+01 1.247e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 09:07:25,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=664640.0, ans=0.125 2023-11-19 09:07:32,797 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:07:36,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=664706.6666666666, ans=0.125 2023-11-19 09:07:36,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664706.6666666666, ans=0.1 2023-11-19 09:07:46,702 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-19 09:07:53,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:07:59,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:08:00,008 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664906.6666666666, ans=0.1 2023-11-19 09:08:00,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664906.6666666666, ans=0.1 2023-11-19 09:08:00,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=664906.6666666666, ans=0.2 2023-11-19 09:08:00,888 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3550, loss[loss=0.06578, simple_loss=0.07455, pruned_loss=0.01423, audio_tagging_loss=0.01428, over 15008.00 frames. ], tot_loss[loss=0.08831, simple_loss=0.107, pruned_loss=0.02441, audio_tagging_loss=0.01039, over 3056150.47 frames. ], batch size: 58, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:08:15,463 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=664973.3333333334, ans=0.2 2023-11-19 09:08:19,933 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2023-11-19 09:08:22,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=665040.0, ans=0.125 2023-11-19 09:08:24,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=665040.0, ans=0.125 2023-11-19 09:08:30,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=665040.0, ans=0.2 2023-11-19 09:08:48,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=665173.3333333334, ans=0.125 2023-11-19 09:08:52,490 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=665173.3333333334, ans=0.125 2023-11-19 09:08:56,456 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3600, loss[loss=0.1071, simple_loss=0.1339, pruned_loss=0.03213, audio_tagging_loss=0.008065, over 16069.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.1074, pruned_loss=0.02429, audio_tagging_loss=0.01026, over 3053884.31 frames. ], batch size: 59, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:09:05,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-11-19 09:09:12,652 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-11-19 09:09:15,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.230e+01 8.845e+01 9.597e+01 1.384e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 09:09:20,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=665373.3333333334, ans=0.2 2023-11-19 09:09:46,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=665506.6666666666, ans=0.125 2023-11-19 09:09:48,460 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=665506.6666666666, ans=0.125 2023-11-19 09:09:52,792 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3650, loss[loss=0.07744, simple_loss=0.09459, pruned_loss=0.02028, audio_tagging_loss=0.009865, over 14932.00 frames. ], tot_loss[loss=0.08838, simple_loss=0.1075, pruned_loss=0.02443, audio_tagging_loss=0.01021, over 3052911.13 frames. ], batch size: 56, lr: 7.80e-03, grad_scale: 16.0 2023-11-19 09:10:27,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665773.3333333334, ans=0.1 2023-11-19 09:10:29,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=665773.3333333334, ans=0.125 2023-11-19 09:10:45,980 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=665840.0, ans=0.125 2023-11-19 09:10:48,457 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3700, loss[loss=0.0648, simple_loss=0.07605, pruned_loss=0.01628, audio_tagging_loss=0.0105, over 16485.00 frames. ], tot_loss[loss=0.08809, simple_loss=0.107, pruned_loss=0.02428, audio_tagging_loss=0.01033, over 3052539.14 frames. ], batch size: 63, lr: 7.80e-03, grad_scale: 16.0 2023-11-19 09:10:55,387 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=665906.6666666666, ans=0.125 2023-11-19 09:11:06,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2023-11-19 09:11:06,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.367e+01 9.183e+01 1.016e+02 1.508e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-19 09:11:38,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=666173.3333333334, ans=0.0 2023-11-19 09:11:43,399 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3750, loss[loss=0.09049, simple_loss=0.114, pruned_loss=0.02333, audio_tagging_loss=0.01018, over 14972.00 frames. ], tot_loss[loss=0.08811, simple_loss=0.1068, pruned_loss=0.02434, audio_tagging_loss=0.01035, over 3060109.95 frames. ], batch size: 54, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:12:01,986 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=666306.6666666666, ans=0.0 2023-11-19 09:12:20,550 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:12:26,436 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2023-11-19 09:12:39,254 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3800, loss[loss=0.08851, simple_loss=0.1113, pruned_loss=0.0236, audio_tagging_loss=0.009261, over 15789.00 frames. ], tot_loss[loss=0.08772, simple_loss=0.1064, pruned_loss=0.02414, audio_tagging_loss=0.01041, over 3056274.18 frames. ], batch size: 56, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:12:43,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=666573.3333333334, ans=0.0 2023-11-19 09:13:01,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.849e+01 9.395e+01 1.021e+02 1.360e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 09:13:29,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=666840.0, ans=0.0 2023-11-19 09:13:32,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=666840.0, ans=0.0 2023-11-19 09:13:37,585 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3850, loss[loss=0.08108, simple_loss=0.09575, pruned_loss=0.02285, audio_tagging_loss=0.01035, over 13918.00 frames. ], tot_loss[loss=0.08741, simple_loss=0.106, pruned_loss=0.02396, audio_tagging_loss=0.01045, over 3052017.99 frames. ], batch size: 53, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:13:37,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=666906.6666666666, ans=0.1 2023-11-19 09:13:47,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=666906.6666666666, ans=0.125 2023-11-19 09:14:05,509 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2023-11-19 09:14:21,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=667173.3333333334, ans=0.125 2023-11-19 09:14:33,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=667240.0, ans=0.125 2023-11-19 09:14:34,207 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3900, loss[loss=0.07793, simple_loss=0.1012, pruned_loss=0.01685, audio_tagging_loss=0.01047, over 15026.00 frames. ], tot_loss[loss=0.08738, simple_loss=0.1062, pruned_loss=0.0238, audio_tagging_loss=0.01049, over 3045373.17 frames. ], batch size: 55, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:14:37,521 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=667240.0, ans=0.125 2023-11-19 09:14:37,577 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=667240.0, ans=0.2 2023-11-19 09:14:39,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-11-19 09:14:40,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=667240.0, ans=0.2 2023-11-19 09:14:45,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=667306.6666666666, ans=0.125 2023-11-19 09:14:52,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.271e+01 8.958e+01 9.768e+01 1.292e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-19 09:14:56,153 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=667373.3333333334, ans=0.125 2023-11-19 09:14:56,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=667373.3333333334, ans=0.125 2023-11-19 09:15:00,459 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=667373.3333333334, ans=0.125 2023-11-19 09:15:03,655 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-19 09:15:06,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.89 vs. limit=15.0 2023-11-19 09:15:30,405 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 3950, loss[loss=0.0849, simple_loss=0.1077, pruned_loss=0.02144, audio_tagging_loss=0.009609, over 15238.00 frames. ], tot_loss[loss=0.08778, simple_loss=0.1063, pruned_loss=0.02391, audio_tagging_loss=0.01069, over 3054845.81 frames. ], batch size: 57, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:15:42,677 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.60 vs. limit=15.0 2023-11-19 09:15:49,265 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=667640.0, ans=0.125 2023-11-19 09:16:10,470 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=667773.3333333334, ans=0.125 2023-11-19 09:16:25,264 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4000, loss[loss=0.1274, simple_loss=0.1626, pruned_loss=0.03862, audio_tagging_loss=0.007504, over 16122.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1074, pruned_loss=0.0242, audio_tagging_loss=0.01073, over 3056954.95 frames. ], batch size: 55, lr: 7.78e-03, grad_scale: 32.0 2023-11-19 09:16:30,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=667906.6666666666, ans=0.2 2023-11-19 09:16:36,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=667973.3333333334, ans=0.125 2023-11-19 09:16:36,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=667973.3333333334, ans=0.2 2023-11-19 09:16:40,974 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=667973.3333333334, ans=0.125 2023-11-19 09:16:45,238 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.595e+01 9.425e+01 1.030e+02 1.465e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-19 09:16:57,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668040.0, ans=0.1 2023-11-19 09:17:04,024 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668106.6666666666, ans=0.1 2023-11-19 09:17:22,408 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4050, loss[loss=0.09391, simple_loss=0.1112, pruned_loss=0.02869, audio_tagging_loss=0.009634, over 14681.00 frames. ], tot_loss[loss=0.08819, simple_loss=0.1071, pruned_loss=0.02394, audio_tagging_loss=0.01069, over 3052908.08 frames. ], batch size: 54, lr: 7.78e-03, grad_scale: 32.0 2023-11-19 09:17:22,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668240.0, ans=0.1 2023-11-19 09:17:22,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=668240.0, ans=0.125 2023-11-19 09:17:23,541 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:17:32,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=668306.6666666666, ans=0.09899494936611666 2023-11-19 09:17:49,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=668373.3333333334, ans=0.125 2023-11-19 09:18:04,464 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=668440.0, ans=0.0 2023-11-19 09:18:14,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=668506.6666666666, ans=0.125 2023-11-19 09:18:17,768 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4100, loss[loss=0.09239, simple_loss=0.1264, pruned_loss=0.02212, audio_tagging_loss=0.007055, over 16098.00 frames. ], tot_loss[loss=0.08844, simple_loss=0.1074, pruned_loss=0.02404, audio_tagging_loss=0.01068, over 3059697.79 frames. ], batch size: 57, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:18:20,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=668573.3333333334, ans=0.125 2023-11-19 09:18:25,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=668573.3333333334, ans=0.0 2023-11-19 09:18:37,096 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=668640.0, ans=0.0 2023-11-19 09:18:37,287 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=668640.0, ans=0.0 2023-11-19 09:18:37,942 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.527e+01 9.124e+01 9.950e+01 1.525e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 09:18:41,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=668706.6666666666, ans=0.0 2023-11-19 09:19:11,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668840.0, ans=0.1 2023-11-19 09:19:13,573 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4150, loss[loss=0.1113, simple_loss=0.1483, pruned_loss=0.03087, audio_tagging_loss=0.006266, over 15914.00 frames. ], tot_loss[loss=0.08861, simple_loss=0.1079, pruned_loss=0.02418, audio_tagging_loss=0.01049, over 3052558.96 frames. ], batch size: 55, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:19:17,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=22.5 2023-11-19 09:19:22,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=668906.6666666666, ans=0.125 2023-11-19 09:19:53,175 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:19:57,559 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=669173.3333333334, ans=0.0 2023-11-19 09:20:02,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=669173.3333333334, ans=0.2 2023-11-19 09:20:04,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=669173.3333333334, ans=0.125 2023-11-19 09:20:10,264 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4200, loss[loss=0.08356, simple_loss=0.09649, pruned_loss=0.02429, audio_tagging_loss=0.01103, over 15855.00 frames. ], tot_loss[loss=0.08823, simple_loss=0.1074, pruned_loss=0.02401, audio_tagging_loss=0.01054, over 3051140.61 frames. ], batch size: 59, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:20:15,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=669240.0, ans=0.125 2023-11-19 09:20:21,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2023-11-19 09:20:27,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=669306.6666666666, ans=0.125 2023-11-19 09:20:30,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.637e+01 8.694e+01 9.811e+01 1.116e+02 1.412e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-19 09:20:30,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=669306.6666666666, ans=0.125 2023-11-19 09:20:57,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=669506.6666666666, ans=10.0 2023-11-19 09:21:06,477 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4250, loss[loss=0.1004, simple_loss=0.1275, pruned_loss=0.02582, audio_tagging_loss=0.01087, over 16094.00 frames. ], tot_loss[loss=0.0887, simple_loss=0.1081, pruned_loss=0.02422, audio_tagging_loss=0.01045, over 3056281.31 frames. ], batch size: 57, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:21:21,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=669640.0, ans=0.125 2023-11-19 09:21:22,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=669640.0, ans=0.2 2023-11-19 09:21:37,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=669706.6666666666, ans=0.025 2023-11-19 09:21:41,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=669773.3333333334, ans=0.125 2023-11-19 09:21:44,572 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=669773.3333333334, ans=0.95 2023-11-19 09:21:54,450 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2023-11-19 09:21:59,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=669840.0, ans=0.125 2023-11-19 09:22:02,687 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4300, loss[loss=0.09172, simple_loss=0.1099, pruned_loss=0.02935, audio_tagging_loss=0.007426, over 15300.00 frames. ], tot_loss[loss=0.08911, simple_loss=0.1086, pruned_loss=0.02445, audio_tagging_loss=0.01037, over 3049900.99 frames. ], batch size: 57, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:22:22,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.641e+01 9.552e+01 1.070e+02 1.517e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-19 09:22:28,288 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=670040.0, ans=0.125 2023-11-19 09:22:34,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=670040.0, ans=0.0 2023-11-19 09:22:38,873 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.437e-02 2023-11-19 09:22:38,941 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=670106.6666666666, ans=0.1 2023-11-19 09:22:57,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=670240.0, ans=0.1 2023-11-19 09:22:58,317 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4350, loss[loss=0.1188, simple_loss=0.1498, pruned_loss=0.03875, audio_tagging_loss=0.005134, over 15084.00 frames. ], tot_loss[loss=0.08968, simple_loss=0.1095, pruned_loss=0.02463, audio_tagging_loss=0.01032, over 3049070.58 frames. ], batch size: 55, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:23:15,123 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.840e-02 2023-11-19 09:23:30,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.74 vs. limit=22.5 2023-11-19 09:23:46,408 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-19 09:23:54,175 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4400, loss[loss=0.09707, simple_loss=0.1237, pruned_loss=0.0259, audio_tagging_loss=0.009337, over 15308.00 frames. ], tot_loss[loss=0.08951, simple_loss=0.1092, pruned_loss=0.02465, audio_tagging_loss=0.01027, over 3051816.80 frames. ], batch size: 56, lr: 7.77e-03, grad_scale: 32.0 2023-11-19 09:24:05,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=670640.0, ans=0.1 2023-11-19 09:24:13,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=670640.0, ans=0.125 2023-11-19 09:24:14,410 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.076e+01 8.724e+01 9.307e+01 1.083e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-19 09:24:31,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=670773.3333333334, ans=0.0 2023-11-19 09:24:32,542 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=670773.3333333334, ans=0.125 2023-11-19 09:24:43,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=670840.0, ans=0.125 2023-11-19 09:24:49,635 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4450, loss[loss=0.0819, simple_loss=0.106, pruned_loss=0.01878, audio_tagging_loss=0.01012, over 15936.00 frames. ], tot_loss[loss=0.08889, simple_loss=0.1084, pruned_loss=0.02438, audio_tagging_loss=0.01032, over 3059273.29 frames. ], batch size: 59, lr: 7.77e-03, grad_scale: 32.0 2023-11-19 09:25:29,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=671106.6666666666, ans=0.0 2023-11-19 09:25:43,839 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=12.0 2023-11-19 09:25:45,406 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4500, loss[loss=0.08788, simple_loss=0.1093, pruned_loss=0.02527, audio_tagging_loss=0.007965, over 16733.00 frames. ], tot_loss[loss=0.0885, simple_loss=0.1079, pruned_loss=0.02423, audio_tagging_loss=0.0103, over 3061427.42 frames. ], batch size: 62, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:26:06,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.307e+01 9.155e+01 9.901e+01 1.565e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 09:26:16,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=671373.3333333334, ans=0.125 2023-11-19 09:26:20,120 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=671440.0, ans=0.125 2023-11-19 09:26:35,969 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=671506.6666666666, ans=0.0 2023-11-19 09:26:37,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2023-11-19 09:26:40,235 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=671573.3333333334, ans=0.0 2023-11-19 09:26:41,060 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4550, loss[loss=0.104, simple_loss=0.1368, pruned_loss=0.0285, audio_tagging_loss=0.007046, over 15063.00 frames. ], tot_loss[loss=0.08818, simple_loss=0.1076, pruned_loss=0.02401, audio_tagging_loss=0.01036, over 3058504.16 frames. ], batch size: 54, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:26:46,618 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=671573.3333333334, ans=0.125 2023-11-19 09:27:06,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=671706.6666666666, ans=0.0 2023-11-19 09:27:22,354 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:27:36,503 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4600, loss[loss=0.08476, simple_loss=0.0938, pruned_loss=0.02254, audio_tagging_loss=0.01532, over 16165.00 frames. ], tot_loss[loss=0.08836, simple_loss=0.1077, pruned_loss=0.02411, audio_tagging_loss=0.01039, over 3053484.42 frames. ], batch size: 62, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:27:55,358 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=671973.3333333334, ans=0.125 2023-11-19 09:27:57,399 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=671973.3333333334, ans=0.0 2023-11-19 09:27:58,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.562e+01 9.559e+01 1.086e+02 1.814e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-19 09:28:00,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=672040.0, ans=0.125 2023-11-19 09:28:11,581 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-19 09:28:15,925 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=672106.6666666666, ans=0.5 2023-11-19 09:28:20,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=672173.3333333334, ans=0.125 2023-11-19 09:28:24,977 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=672173.3333333334, ans=0.0 2023-11-19 09:28:31,706 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=672240.0, ans=0.2 2023-11-19 09:28:32,583 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4650, loss[loss=0.1212, simple_loss=0.1572, pruned_loss=0.03503, audio_tagging_loss=0.007582, over 15958.00 frames. ], tot_loss[loss=0.08755, simple_loss=0.1063, pruned_loss=0.02374, audio_tagging_loss=0.01065, over 3051631.17 frames. ], batch size: 55, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:28:36,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=672240.0, ans=0.1 2023-11-19 09:29:02,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=672373.3333333334, ans=0.125 2023-11-19 09:29:26,905 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-19 09:29:28,493 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4700, loss[loss=0.08036, simple_loss=0.09485, pruned_loss=0.02011, audio_tagging_loss=0.01283, over 15271.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.1067, pruned_loss=0.02396, audio_tagging_loss=0.01064, over 3047238.60 frames. ], batch size: 55, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:29:28,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=672573.3333333334, ans=0.125 2023-11-19 09:29:29,694 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=672573.3333333334, ans=0.125 2023-11-19 09:29:39,791 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=672640.0, ans=0.125 2023-11-19 09:29:46,247 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=672640.0, ans=0.1 2023-11-19 09:29:49,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.341e+01 9.226e+01 1.015e+02 1.641e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 09:29:56,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=672706.6666666666, ans=0.1 2023-11-19 09:30:02,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=672773.3333333334, ans=0.2 2023-11-19 09:30:03,934 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=672773.3333333334, ans=0.0 2023-11-19 09:30:12,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.40 vs. limit=22.5 2023-11-19 09:30:19,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=672840.0, ans=0.125 2023-11-19 09:30:24,325 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4750, loss[loss=0.07402, simple_loss=0.08784, pruned_loss=0.01859, audio_tagging_loss=0.01151, over 15632.00 frames. ], tot_loss[loss=0.08878, simple_loss=0.1076, pruned_loss=0.02436, audio_tagging_loss=0.0106, over 3042765.15 frames. ], batch size: 59, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:30:25,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=672906.6666666666, ans=0.125 2023-11-19 09:30:28,759 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=672906.6666666666, ans=0.09899494936611666 2023-11-19 09:30:37,249 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=672973.3333333334, ans=0.09899494936611666 2023-11-19 09:30:40,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=672973.3333333334, ans=0.1 2023-11-19 09:30:55,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673040.0, ans=0.1 2023-11-19 09:31:05,097 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2023-11-19 09:31:07,879 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=673173.3333333334, ans=0.0 2023-11-19 09:31:10,044 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=673173.3333333334, ans=0.2 2023-11-19 09:31:12,054 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=673173.3333333334, ans=0.0 2023-11-19 09:31:20,473 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4800, loss[loss=0.06484, simple_loss=0.07662, pruned_loss=0.01609, audio_tagging_loss=0.01043, over 14500.00 frames. ], tot_loss[loss=0.08877, simple_loss=0.1072, pruned_loss=0.02442, audio_tagging_loss=0.01073, over 3049803.98 frames. ], batch size: 57, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:31:37,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=673306.6666666666, ans=0.125 2023-11-19 09:31:41,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.288e+01 8.950e+01 9.768e+01 1.286e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 09:31:48,827 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:32:04,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=673506.6666666666, ans=0.125 2023-11-19 09:32:15,758 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=673573.3333333334, ans=0.125 2023-11-19 09:32:16,676 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4850, loss[loss=0.09936, simple_loss=0.1206, pruned_loss=0.02821, audio_tagging_loss=0.01083, over 15015.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.1065, pruned_loss=0.02416, audio_tagging_loss=0.01086, over 3045759.76 frames. ], batch size: 57, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:32:26,455 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=673640.0, ans=0.125 2023-11-19 09:32:35,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-19 09:32:39,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=673706.6666666666, ans=0.0 2023-11-19 09:33:01,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673840.0, ans=0.1 2023-11-19 09:33:09,442 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.73 vs. limit=6.0 2023-11-19 09:33:12,503 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4900, loss[loss=0.06381, simple_loss=0.07879, pruned_loss=0.01434, audio_tagging_loss=0.01008, over 13529.00 frames. ], tot_loss[loss=0.08788, simple_loss=0.1061, pruned_loss=0.02403, audio_tagging_loss=0.0108, over 3043359.02 frames. ], batch size: 52, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:33:19,014 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=673906.6666666666, ans=0.125 2023-11-19 09:33:27,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=673973.3333333334, ans=0.125 2023-11-19 09:33:28,246 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-11-19 09:33:33,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.373e+01 9.037e+01 1.012e+02 1.386e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 09:34:02,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=674173.3333333334, ans=0.1 2023-11-19 09:34:07,882 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 4950, loss[loss=0.07485, simple_loss=0.08708, pruned_loss=0.02015, audio_tagging_loss=0.01116, over 17016.00 frames. ], tot_loss[loss=0.08743, simple_loss=0.1057, pruned_loss=0.02401, audio_tagging_loss=0.01056, over 3044141.56 frames. ], batch size: 64, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:34:09,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=674240.0, ans=0.2 2023-11-19 09:34:39,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=674373.3333333334, ans=0.125 2023-11-19 09:35:04,108 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5000, loss[loss=0.08909, simple_loss=0.1187, pruned_loss=0.02215, audio_tagging_loss=0.007577, over 15551.00 frames. ], tot_loss[loss=0.08768, simple_loss=0.1066, pruned_loss=0.02403, audio_tagging_loss=0.01034, over 3044562.33 frames. ], batch size: 56, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:35:06,429 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:35:06,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=674573.3333333334, ans=0.125 2023-11-19 09:35:15,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=674640.0, ans=0.2 2023-11-19 09:35:25,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.355e+01 9.053e+01 1.007e+02 1.287e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 09:35:46,206 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=674773.3333333334, ans=0.0 2023-11-19 09:35:47,758 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.31 vs. limit=22.5 2023-11-19 09:35:48,365 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=674840.0, ans=0.1 2023-11-19 09:35:59,629 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5050, loss[loss=0.1028, simple_loss=0.1193, pruned_loss=0.03132, audio_tagging_loss=0.01185, over 16239.00 frames. ], tot_loss[loss=0.0875, simple_loss=0.1062, pruned_loss=0.02413, audio_tagging_loss=0.01027, over 3045828.89 frames. ], batch size: 63, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:36:31,435 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=675040.0, ans=0.1 2023-11-19 09:36:31,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=675040.0, ans=0.125 2023-11-19 09:36:34,627 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=675106.6666666666, ans=0.125 2023-11-19 09:36:36,737 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=675106.6666666666, ans=0.05 2023-11-19 09:36:38,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=675106.6666666666, ans=0.125 2023-11-19 09:36:44,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675173.3333333334, ans=0.1 2023-11-19 09:36:44,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=675173.3333333334, ans=0.125 2023-11-19 09:36:55,049 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5100, loss[loss=0.08041, simple_loss=0.1002, pruned_loss=0.02148, audio_tagging_loss=0.008843, over 14423.00 frames. ], tot_loss[loss=0.08784, simple_loss=0.1065, pruned_loss=0.02433, audio_tagging_loss=0.01026, over 3041607.18 frames. ], batch size: 57, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:36:59,218 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=675240.0, ans=15.0 2023-11-19 09:37:03,107 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:37:03,323 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-11-19 09:37:15,295 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=675306.6666666666, ans=0.04949747468305833 2023-11-19 09:37:16,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.361e+01 9.263e+01 1.052e+02 1.984e+02, threshold=1.853e+02, percent-clipped=1.0 2023-11-19 09:37:27,590 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=675440.0, ans=0.125 2023-11-19 09:37:50,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=675573.3333333334, ans=0.0 2023-11-19 09:37:51,133 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5150, loss[loss=0.05745, simple_loss=0.06601, pruned_loss=0.01163, audio_tagging_loss=0.01282, over 14758.00 frames. ], tot_loss[loss=0.08788, simple_loss=0.1068, pruned_loss=0.02418, audio_tagging_loss=0.01028, over 3044086.14 frames. ], batch size: 59, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:37:56,851 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.16 vs. limit=22.5 2023-11-19 09:37:59,797 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:38:00,958 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=675640.0, ans=0.125 2023-11-19 09:38:11,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.78 vs. limit=15.0 2023-11-19 09:38:15,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=675706.6666666666, ans=0.2 2023-11-19 09:38:18,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675706.6666666666, ans=0.1 2023-11-19 09:38:46,044 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5200, loss[loss=0.09387, simple_loss=0.1281, pruned_loss=0.02276, audio_tagging_loss=0.007064, over 15249.00 frames. ], tot_loss[loss=0.08888, simple_loss=0.1084, pruned_loss=0.0245, audio_tagging_loss=0.0102, over 3043780.34 frames. ], batch size: 54, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:38:48,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=675906.6666666666, ans=0.125 2023-11-19 09:39:01,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=675973.3333333334, ans=0.2 2023-11-19 09:39:07,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 8.521e+01 9.161e+01 1.002e+02 1.521e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 09:39:09,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=676040.0, ans=22.5 2023-11-19 09:39:27,814 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.24 vs. limit=10.0 2023-11-19 09:39:41,616 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5250, loss[loss=0.06236, simple_loss=0.06538, pruned_loss=0.01336, audio_tagging_loss=0.01631, over 14376.00 frames. ], tot_loss[loss=0.08882, simple_loss=0.108, pruned_loss=0.02454, audio_tagging_loss=0.01029, over 3034679.87 frames. ], batch size: 56, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:39:45,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=676240.0, ans=0.125 2023-11-19 09:40:01,353 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=676306.6666666666, ans=0.125 2023-11-19 09:40:09,154 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-11-19 09:40:16,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-11-19 09:40:37,336 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5300, loss[loss=0.07808, simple_loss=0.09318, pruned_loss=0.01975, audio_tagging_loss=0.01175, over 14218.00 frames. ], tot_loss[loss=0.08861, simple_loss=0.1075, pruned_loss=0.02446, audio_tagging_loss=0.01037, over 3037893.35 frames. ], batch size: 55, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:40:54,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=676640.0, ans=0.0 2023-11-19 09:40:56,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=676640.0, ans=0.0 2023-11-19 09:40:58,530 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.778e+01 8.435e+01 9.072e+01 1.015e+02 1.516e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 09:41:04,012 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=676706.6666666666, ans=0.95 2023-11-19 09:41:08,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=676706.6666666666, ans=0.125 2023-11-19 09:41:08,462 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=676706.6666666666, ans=0.125 2023-11-19 09:41:32,750 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5350, loss[loss=0.08905, simple_loss=0.105, pruned_loss=0.02694, audio_tagging_loss=0.009619, over 14470.00 frames. ], tot_loss[loss=0.08888, simple_loss=0.1077, pruned_loss=0.02473, audio_tagging_loss=0.01028, over 3037829.93 frames. ], batch size: 56, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:41:37,208 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=676906.6666666666, ans=0.0 2023-11-19 09:41:45,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=676973.3333333334, ans=0.0 2023-11-19 09:42:02,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=677040.0, ans=0.1 2023-11-19 09:42:05,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=677106.6666666666, ans=0.125 2023-11-19 09:42:23,323 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=677173.3333333334, ans=0.2 2023-11-19 09:42:25,354 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=677173.3333333334, ans=0.125 2023-11-19 09:42:28,293 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5400, loss[loss=0.09701, simple_loss=0.118, pruned_loss=0.02721, audio_tagging_loss=0.01081, over 14931.00 frames. ], tot_loss[loss=0.08835, simple_loss=0.1071, pruned_loss=0.02441, audio_tagging_loss=0.01039, over 3037682.66 frames. ], batch size: 55, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:42:45,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=677306.6666666666, ans=0.2 2023-11-19 09:42:49,671 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.356e+01 9.040e+01 1.006e+02 1.272e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 09:43:09,929 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=677440.0, ans=0.0 2023-11-19 09:43:09,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=677440.0, ans=0.125 2023-11-19 09:43:12,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=677506.6666666666, ans=0.125 2023-11-19 09:43:24,107 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5450, loss[loss=0.09198, simple_loss=0.1155, pruned_loss=0.02377, audio_tagging_loss=0.01046, over 15656.00 frames. ], tot_loss[loss=0.08842, simple_loss=0.1069, pruned_loss=0.02444, audio_tagging_loss=0.01054, over 3041707.26 frames. ], batch size: 57, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:43:49,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=677706.6666666666, ans=0.125 2023-11-19 09:43:54,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=677706.6666666666, ans=0.125 2023-11-19 09:43:55,781 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=677706.6666666666, ans=10.0 2023-11-19 09:44:07,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=677773.3333333334, ans=15.0 2023-11-19 09:44:08,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=677840.0, ans=0.125 2023-11-19 09:44:10,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=677840.0, ans=0.125 2023-11-19 09:44:20,024 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5500, loss[loss=0.1078, simple_loss=0.1315, pruned_loss=0.03139, audio_tagging_loss=0.01068, over 14393.00 frames. ], tot_loss[loss=0.08901, simple_loss=0.1078, pruned_loss=0.02466, audio_tagging_loss=0.01048, over 3040463.28 frames. ], batch size: 54, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:44:21,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=677906.6666666666, ans=0.95 2023-11-19 09:44:29,313 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2023-11-19 09:44:41,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.594e+01 9.259e+01 1.001e+02 1.326e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-19 09:45:16,534 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5550, loss[loss=0.1112, simple_loss=0.1334, pruned_loss=0.03627, audio_tagging_loss=0.008248, over 14384.00 frames. ], tot_loss[loss=0.089, simple_loss=0.1077, pruned_loss=0.02456, audio_tagging_loss=0.01059, over 3037984.26 frames. ], batch size: 55, lr: 7.72e-03, grad_scale: 32.0 2023-11-19 09:45:19,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=678240.0, ans=0.0 2023-11-19 09:45:26,949 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:45:29,334 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-19 09:45:38,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=678373.3333333334, ans=0.125 2023-11-19 09:45:48,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=678440.0, ans=0.2 2023-11-19 09:45:54,267 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.82 vs. limit=10.0 2023-11-19 09:45:58,630 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.18 vs. limit=10.0 2023-11-19 09:46:00,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=678506.6666666666, ans=0.1 2023-11-19 09:46:02,421 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=678506.6666666666, ans=0.125 2023-11-19 09:46:03,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678506.6666666666, ans=0.1 2023-11-19 09:46:05,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=678506.6666666666, ans=0.5 2023-11-19 09:46:12,185 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5600, loss[loss=0.0634, simple_loss=0.06577, pruned_loss=0.01841, audio_tagging_loss=0.0121, over 13805.00 frames. ], tot_loss[loss=0.08857, simple_loss=0.1072, pruned_loss=0.02429, audio_tagging_loss=0.01071, over 3034083.92 frames. ], batch size: 54, lr: 7.72e-03, grad_scale: 32.0 2023-11-19 09:46:34,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.355e+01 9.090e+01 1.021e+02 1.619e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 09:46:52,571 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:47:04,086 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=678840.0, ans=0.025 2023-11-19 09:47:07,182 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=678906.6666666666, ans=0.1 2023-11-19 09:47:07,976 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5650, loss[loss=0.06697, simple_loss=0.08398, pruned_loss=0.01305, audio_tagging_loss=0.01192, over 15236.00 frames. ], tot_loss[loss=0.08826, simple_loss=0.1067, pruned_loss=0.02411, audio_tagging_loss=0.01083, over 3033877.22 frames. ], batch size: 58, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:48:03,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=679240.0, ans=0.0 2023-11-19 09:48:04,376 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5700, loss[loss=0.0608, simple_loss=0.07406, pruned_loss=0.01529, audio_tagging_loss=0.008486, over 14683.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1064, pruned_loss=0.02418, audio_tagging_loss=0.01075, over 3038273.17 frames. ], batch size: 56, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:48:26,549 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.673e+01 9.391e+01 1.015e+02 1.366e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 09:48:29,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=679373.3333333334, ans=0.125 2023-11-19 09:48:33,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.29 vs. limit=10.0 2023-11-19 09:48:38,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=679440.0, ans=0.2 2023-11-19 09:48:55,322 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=679506.6666666666, ans=0.125 2023-11-19 09:48:57,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=679506.6666666666, ans=0.05 2023-11-19 09:48:59,868 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5750, loss[loss=0.09689, simple_loss=0.1161, pruned_loss=0.02771, audio_tagging_loss=0.01113, over 16589.00 frames. ], tot_loss[loss=0.08686, simple_loss=0.1049, pruned_loss=0.02378, audio_tagging_loss=0.01064, over 3044911.55 frames. ], batch size: 61, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:49:03,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=679573.3333333334, ans=0.125 2023-11-19 09:49:11,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=679640.0, ans=0.125 2023-11-19 09:49:15,444 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=679640.0, ans=0.0 2023-11-19 09:49:23,453 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=679706.6666666666, ans=0.2 2023-11-19 09:49:25,442 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:49:42,603 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2023-11-19 09:49:47,901 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=679840.0, ans=0.125 2023-11-19 09:49:55,316 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5800, loss[loss=0.06912, simple_loss=0.08186, pruned_loss=0.01835, audio_tagging_loss=0.009841, over 15530.00 frames. ], tot_loss[loss=0.08731, simple_loss=0.1056, pruned_loss=0.02399, audio_tagging_loss=0.01052, over 3037746.92 frames. ], batch size: 60, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:50:01,157 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2023-11-19 09:50:01,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=679906.6666666666, ans=0.0 2023-11-19 09:50:17,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.685e+01 8.360e+01 9.012e+01 9.906e+01 1.267e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 09:50:22,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=680040.0, ans=0.125 2023-11-19 09:50:34,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=680106.6666666666, ans=0.2 2023-11-19 09:50:42,531 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2023-11-19 09:50:46,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=680173.3333333334, ans=0.0 2023-11-19 09:50:50,891 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5850, loss[loss=0.1334, simple_loss=0.1751, pruned_loss=0.037, audio_tagging_loss=0.008847, over 16080.00 frames. ], tot_loss[loss=0.0876, simple_loss=0.1061, pruned_loss=0.02402, audio_tagging_loss=0.01051, over 3036777.31 frames. ], batch size: 58, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:51:00,530 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.20 vs. limit=22.5 2023-11-19 09:51:01,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2023-11-19 09:51:01,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-11-19 09:51:08,063 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:51:15,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=680373.3333333334, ans=0.0 2023-11-19 09:51:21,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=680373.3333333334, ans=0.125 2023-11-19 09:51:45,830 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=680573.3333333334, ans=0.0 2023-11-19 09:51:47,066 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5900, loss[loss=0.1006, simple_loss=0.1317, pruned_loss=0.02756, audio_tagging_loss=0.00718, over 15673.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.1065, pruned_loss=0.02424, audio_tagging_loss=0.01043, over 3032086.05 frames. ], batch size: 55, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:52:06,916 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.400e-01 2023-11-19 09:52:08,762 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.198e+01 8.843e+01 9.810e+01 1.400e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 09:52:31,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=680840.0, ans=0.125 2023-11-19 09:52:34,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=680840.0, ans=0.125 2023-11-19 09:52:42,549 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 5950, loss[loss=0.1087, simple_loss=0.143, pruned_loss=0.02967, audio_tagging_loss=0.00752, over 16039.00 frames. ], tot_loss[loss=0.08729, simple_loss=0.1055, pruned_loss=0.02399, audio_tagging_loss=0.01054, over 3040998.28 frames. ], batch size: 56, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:52:55,936 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=680973.3333333334, ans=0.2 2023-11-19 09:53:19,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=681106.6666666666, ans=0.2 2023-11-19 09:53:25,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=681106.6666666666, ans=0.125 2023-11-19 09:53:25,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=681106.6666666666, ans=0.125 2023-11-19 09:53:38,039 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6000, loss[loss=0.07298, simple_loss=0.08416, pruned_loss=0.01752, audio_tagging_loss=0.01338, over 14746.00 frames. ], tot_loss[loss=0.08698, simple_loss=0.1054, pruned_loss=0.02373, audio_tagging_loss=0.01053, over 3043027.58 frames. ], batch size: 58, lr: 7.71e-03, grad_scale: 32.0 2023-11-19 09:53:38,040 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 09:53:54,360 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9542, 0.7871, 2.1867, 2.1465, 2.6288, 2.4939, 2.4615, 2.3831], device='cuda:3') 2023-11-19 09:53:56,908 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.7036, 4.1267, 3.8182, 2.9571], device='cuda:3') 2023-11-19 09:54:10,898 INFO [train_asr.py:1147] (3/4) Epoch 9, validation: loss=0.06636, simple_loss=0.05607, pruned_loss=0.006778, audio_tagging_loss=0.03155, over 4681554.00 frames. 2023-11-19 09:54:10,899 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 09:54:33,947 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.283e+01 9.118e+01 1.003e+02 1.340e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 09:54:50,016 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=681440.0, ans=0.125 2023-11-19 09:54:51,009 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:54:55,266 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.66 vs. limit=10.0 2023-11-19 09:55:03,904 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.669e-03 2023-11-19 09:55:06,813 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6050, loss[loss=0.06688, simple_loss=0.08246, pruned_loss=0.01409, audio_tagging_loss=0.01156, over 13868.00 frames. ], tot_loss[loss=0.08741, simple_loss=0.1059, pruned_loss=0.02398, audio_tagging_loss=0.01048, over 3040009.38 frames. ], batch size: 54, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:55:30,796 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=681706.6666666666, ans=0.125 2023-11-19 09:55:47,229 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=681773.3333333334, ans=0.0 2023-11-19 09:55:49,658 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-19 09:56:01,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=681906.6666666666, ans=0.125 2023-11-19 09:56:02,350 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6100, loss[loss=0.1102, simple_loss=0.1396, pruned_loss=0.03072, audio_tagging_loss=0.009671, over 17139.00 frames. ], tot_loss[loss=0.08713, simple_loss=0.1058, pruned_loss=0.02378, audio_tagging_loss=0.01044, over 3034176.87 frames. ], batch size: 60, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:56:10,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=681906.6666666666, ans=0.125 2023-11-19 09:56:19,735 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.85 vs. limit=22.5 2023-11-19 09:56:26,131 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.498e+01 9.519e+01 1.052e+02 1.737e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 09:56:26,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=682040.0, ans=0.0 2023-11-19 09:56:40,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682106.6666666666, ans=0.1 2023-11-19 09:56:49,071 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=682173.3333333334, ans=0.2 2023-11-19 09:56:57,824 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6150, loss[loss=0.08617, simple_loss=0.1065, pruned_loss=0.0218, audio_tagging_loss=0.01111, over 14621.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.1056, pruned_loss=0.0237, audio_tagging_loss=0.01051, over 3045517.01 frames. ], batch size: 56, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:57:27,696 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=682373.3333333334, ans=0.125 2023-11-19 09:57:31,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=682440.0, ans=0.04949747468305833 2023-11-19 09:57:53,347 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6200, loss[loss=0.06504, simple_loss=0.07335, pruned_loss=0.01453, audio_tagging_loss=0.01383, over 15193.00 frames. ], tot_loss[loss=0.08719, simple_loss=0.1057, pruned_loss=0.02373, audio_tagging_loss=0.01062, over 3051287.63 frames. ], batch size: 59, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:57:56,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=682573.3333333334, ans=0.0 2023-11-19 09:57:56,795 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=682573.3333333334, ans=0.125 2023-11-19 09:58:02,386 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=22.5 2023-11-19 09:58:16,315 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.555e+01 9.157e+01 9.904e+01 1.201e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 09:58:45,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=682840.0, ans=0.2 2023-11-19 09:58:47,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=15.0 2023-11-19 09:58:49,116 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6250, loss[loss=0.1189, simple_loss=0.1579, pruned_loss=0.03432, audio_tagging_loss=0.005588, over 15667.00 frames. ], tot_loss[loss=0.08704, simple_loss=0.1053, pruned_loss=0.02371, audio_tagging_loss=0.01066, over 3043732.75 frames. ], batch size: 56, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:59:07,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=682973.3333333334, ans=22.5 2023-11-19 09:59:32,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=683173.3333333334, ans=0.125 2023-11-19 09:59:44,724 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6300, loss[loss=0.0864, simple_loss=0.1107, pruned_loss=0.02352, audio_tagging_loss=0.007525, over 15306.00 frames. ], tot_loss[loss=0.08715, simple_loss=0.1053, pruned_loss=0.02372, audio_tagging_loss=0.01077, over 3038803.96 frames. ], batch size: 57, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:59:44,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=683240.0, ans=0.05 2023-11-19 09:59:49,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=683240.0, ans=0.125 2023-11-19 09:59:50,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=683240.0, ans=0.1 2023-11-19 09:59:52,779 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=683240.0, ans=0.025 2023-11-19 09:59:58,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=683306.6666666666, ans=0.2 2023-11-19 10:00:07,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.511e+01 9.206e+01 1.011e+02 2.353e+02, threshold=1.841e+02, percent-clipped=1.0 2023-11-19 10:00:25,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=683440.0, ans=0.125 2023-11-19 10:00:28,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=683506.6666666666, ans=0.2 2023-11-19 10:00:33,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=683506.6666666666, ans=0.0 2023-11-19 10:00:34,875 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2023-11-19 10:00:35,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=683506.6666666666, ans=0.125 2023-11-19 10:00:40,552 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6350, loss[loss=0.09669, simple_loss=0.1224, pruned_loss=0.02741, audio_tagging_loss=0.008092, over 15121.00 frames. ], tot_loss[loss=0.08657, simple_loss=0.1047, pruned_loss=0.02344, audio_tagging_loss=0.01079, over 3034353.18 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 16.0 2023-11-19 10:00:47,248 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=683573.3333333334, ans=0.0 2023-11-19 10:00:55,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=683640.0, ans=0.1 2023-11-19 10:01:18,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=683773.3333333334, ans=0.07 2023-11-19 10:01:18,336 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=683773.3333333334, ans=0.0 2023-11-19 10:01:35,457 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6400, loss[loss=0.08957, simple_loss=0.1088, pruned_loss=0.02285, audio_tagging_loss=0.01231, over 15728.00 frames. ], tot_loss[loss=0.08689, simple_loss=0.1051, pruned_loss=0.02348, audio_tagging_loss=0.01085, over 3030007.12 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:01:37,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=683906.6666666666, ans=0.07 2023-11-19 10:01:46,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=683973.3333333334, ans=0.015 2023-11-19 10:01:53,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=12.0 2023-11-19 10:01:59,222 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.664e+01 8.378e+01 8.903e+01 9.717e+01 1.251e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 10:02:14,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2023-11-19 10:02:30,940 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6450, loss[loss=0.08057, simple_loss=0.1126, pruned_loss=0.01729, audio_tagging_loss=0.006976, over 15024.00 frames. ], tot_loss[loss=0.08751, simple_loss=0.1052, pruned_loss=0.02382, audio_tagging_loss=0.0111, over 3032403.95 frames. ], batch size: 55, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:02:48,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=684306.6666666666, ans=0.125 2023-11-19 10:03:10,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=684440.0, ans=0.2 2023-11-19 10:03:12,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=684440.0, ans=0.5 2023-11-19 10:03:27,120 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6500, loss[loss=0.1069, simple_loss=0.1304, pruned_loss=0.03158, audio_tagging_loss=0.01007, over 15672.00 frames. ], tot_loss[loss=0.08729, simple_loss=0.1055, pruned_loss=0.02353, audio_tagging_loss=0.01103, over 3032988.50 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:03:30,415 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=684573.3333333334, ans=0.0 2023-11-19 10:03:31,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=684573.3333333334, ans=0.05 2023-11-19 10:03:32,900 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-11-19 10:03:50,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.426e+01 9.031e+01 9.982e+01 1.610e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 10:04:05,251 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=684773.3333333334, ans=0.0 2023-11-19 10:04:14,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=684840.0, ans=0.2 2023-11-19 10:04:20,819 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.65 vs. limit=15.0 2023-11-19 10:04:22,322 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6550, loss[loss=0.08116, simple_loss=0.09932, pruned_loss=0.02128, audio_tagging_loss=0.01022, over 14700.00 frames. ], tot_loss[loss=0.08674, simple_loss=0.1049, pruned_loss=0.02345, audio_tagging_loss=0.01083, over 3031989.83 frames. ], batch size: 53, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:04:24,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=684906.6666666666, ans=0.2 2023-11-19 10:04:47,404 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=685040.0, ans=0.125 2023-11-19 10:05:18,131 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6600, loss[loss=0.08343, simple_loss=0.1052, pruned_loss=0.0218, audio_tagging_loss=0.009019, over 15390.00 frames. ], tot_loss[loss=0.08757, simple_loss=0.1061, pruned_loss=0.02384, audio_tagging_loss=0.01066, over 3032427.93 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:05:21,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=685240.0, ans=0.2 2023-11-19 10:05:24,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=685240.0, ans=0.125 2023-11-19 10:05:31,419 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-19 10:05:41,605 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.207e+01 8.810e+01 9.589e+01 1.176e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 10:05:42,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=685373.3333333334, ans=0.125 2023-11-19 10:05:46,902 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-11-19 10:06:14,468 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6650, loss[loss=0.09047, simple_loss=0.1155, pruned_loss=0.02257, audio_tagging_loss=0.01015, over 15377.00 frames. ], tot_loss[loss=0.08673, simple_loss=0.1053, pruned_loss=0.02347, audio_tagging_loss=0.01058, over 3040287.50 frames. ], batch size: 57, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:06:28,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=685640.0, ans=0.2 2023-11-19 10:06:38,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=685706.6666666666, ans=0.0 2023-11-19 10:06:50,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=685773.3333333334, ans=0.125 2023-11-19 10:07:09,322 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6700, loss[loss=0.09523, simple_loss=0.112, pruned_loss=0.02662, audio_tagging_loss=0.01261, over 14097.00 frames. ], tot_loss[loss=0.08702, simple_loss=0.1059, pruned_loss=0.02365, audio_tagging_loss=0.01042, over 3045165.53 frames. ], batch size: 54, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:07:17,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685906.6666666666, ans=0.1 2023-11-19 10:07:23,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2023-11-19 10:07:31,293 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=686040.0, ans=0.125 2023-11-19 10:07:33,120 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.482e+01 9.410e+01 1.023e+02 1.409e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 10:07:41,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=686040.0, ans=0.125 2023-11-19 10:07:44,067 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2023-11-19 10:07:44,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=686106.6666666666, ans=0.125 2023-11-19 10:07:49,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=686106.6666666666, ans=0.125 2023-11-19 10:07:53,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=686173.3333333334, ans=0.125 2023-11-19 10:07:57,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=686173.3333333334, ans=0.2 2023-11-19 10:08:00,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=686173.3333333334, ans=0.125 2023-11-19 10:08:05,675 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6750, loss[loss=0.08968, simple_loss=0.1095, pruned_loss=0.02221, audio_tagging_loss=0.01272, over 14391.00 frames. ], tot_loss[loss=0.08735, simple_loss=0.106, pruned_loss=0.02381, audio_tagging_loss=0.01053, over 3038477.95 frames. ], batch size: 57, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:08:09,102 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=686240.0, ans=0.125 2023-11-19 10:08:13,981 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=686240.0, ans=0.0 2023-11-19 10:08:16,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=686306.6666666666, ans=0.125 2023-11-19 10:08:16,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=686306.6666666666, ans=0.0 2023-11-19 10:08:17,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=686306.6666666666, ans=0.1 2023-11-19 10:08:17,083 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=686306.6666666666, ans=0.125 2023-11-19 10:08:37,835 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.63 vs. limit=22.5 2023-11-19 10:08:39,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=686440.0, ans=0.125 2023-11-19 10:08:53,832 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.84 vs. limit=15.0 2023-11-19 10:08:54,832 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=686506.6666666666, ans=0.125 2023-11-19 10:09:00,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=686573.3333333334, ans=0.035 2023-11-19 10:09:01,562 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6800, loss[loss=0.05736, simple_loss=0.06739, pruned_loss=0.01513, audio_tagging_loss=0.008535, over 14866.00 frames. ], tot_loss[loss=0.08683, simple_loss=0.1054, pruned_loss=0.02368, audio_tagging_loss=0.01045, over 3039483.45 frames. ], batch size: 58, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:09:22,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=22.5 2023-11-19 10:09:24,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.110e+01 8.200e+01 8.866e+01 9.843e+01 1.346e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 10:09:42,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=686773.3333333334, ans=0.2 2023-11-19 10:09:56,529 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6850, loss[loss=0.1051, simple_loss=0.1312, pruned_loss=0.0306, audio_tagging_loss=0.008866, over 14931.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1067, pruned_loss=0.02405, audio_tagging_loss=0.0104, over 3037510.97 frames. ], batch size: 55, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:09:56,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=686906.6666666666, ans=0.0 2023-11-19 10:09:57,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=686906.6666666666, ans=0.04949747468305833 2023-11-19 10:09:59,884 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=686906.6666666666, ans=0.95 2023-11-19 10:10:11,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2023-11-19 10:10:26,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=687040.0, ans=0.0 2023-11-19 10:10:28,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=687040.0, ans=0.0 2023-11-19 10:10:32,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=687106.6666666666, ans=0.1 2023-11-19 10:10:34,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.70 vs. limit=15.0 2023-11-19 10:10:34,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=687106.6666666666, ans=0.0 2023-11-19 10:10:50,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=687173.3333333334, ans=0.04949747468305833 2023-11-19 10:10:52,110 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6900, loss[loss=0.1097, simple_loss=0.1391, pruned_loss=0.03012, audio_tagging_loss=0.009995, over 16407.00 frames. ], tot_loss[loss=0.08759, simple_loss=0.1066, pruned_loss=0.02385, audio_tagging_loss=0.01044, over 3038564.20 frames. ], batch size: 58, lr: 7.67e-03, grad_scale: 32.0 2023-11-19 10:10:54,972 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=687240.0, ans=0.125 2023-11-19 10:10:57,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=687240.0, ans=0.125 2023-11-19 10:11:00,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=687240.0, ans=0.2 2023-11-19 10:11:07,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=687306.6666666666, ans=0.05 2023-11-19 10:11:10,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=687306.6666666666, ans=0.2 2023-11-19 10:11:11,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=687306.6666666666, ans=0.0 2023-11-19 10:11:15,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.720e+01 9.438e+01 1.043e+02 1.545e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 10:11:23,366 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=687373.3333333334, ans=0.0 2023-11-19 10:11:34,262 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:11:48,000 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 6950, loss[loss=0.09599, simple_loss=0.1218, pruned_loss=0.0258, audio_tagging_loss=0.009307, over 14560.00 frames. ], tot_loss[loss=0.08831, simple_loss=0.1076, pruned_loss=0.02408, audio_tagging_loss=0.01043, over 3038246.23 frames. ], batch size: 56, lr: 7.67e-03, grad_scale: 32.0 2023-11-19 10:11:55,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2023-11-19 10:11:59,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=687640.0, ans=0.125 2023-11-19 10:12:16,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=687706.6666666666, ans=0.125 2023-11-19 10:12:19,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=687773.3333333334, ans=0.0 2023-11-19 10:12:25,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=687773.3333333334, ans=0.125 2023-11-19 10:12:42,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=687906.6666666666, ans=0.0 2023-11-19 10:12:43,295 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7000, loss[loss=0.1042, simple_loss=0.133, pruned_loss=0.03004, audio_tagging_loss=0.007724, over 15859.00 frames. ], tot_loss[loss=0.08861, simple_loss=0.1082, pruned_loss=0.02419, audio_tagging_loss=0.01035, over 3032454.74 frames. ], batch size: 57, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:12:49,840 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=687906.6666666666, ans=0.1 2023-11-19 10:13:05,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=688040.0, ans=0.07 2023-11-19 10:13:08,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.313e+01 8.438e+01 9.310e+01 1.017e+02 1.231e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 10:13:19,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=688106.6666666666, ans=0.125 2023-11-19 10:13:19,697 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:13:39,126 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7050, loss[loss=0.1233, simple_loss=0.1598, pruned_loss=0.03749, audio_tagging_loss=0.005877, over 15888.00 frames. ], tot_loss[loss=0.08868, simple_loss=0.1081, pruned_loss=0.02424, audio_tagging_loss=0.01037, over 3035191.55 frames. ], batch size: 58, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:13:44,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=688240.0, ans=0.125 2023-11-19 10:13:50,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=688306.6666666666, ans=0.1 2023-11-19 10:14:13,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688440.0, ans=0.1 2023-11-19 10:14:35,307 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7100, loss[loss=0.09496, simple_loss=0.1064, pruned_loss=0.02731, audio_tagging_loss=0.01446, over 15125.00 frames. ], tot_loss[loss=0.08948, simple_loss=0.1092, pruned_loss=0.02453, audio_tagging_loss=0.01036, over 3040181.30 frames. ], batch size: 57, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:14:59,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.651e+01 8.454e+01 9.144e+01 1.007e+02 1.381e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-19 10:15:22,638 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=688840.0, ans=0.125 2023-11-19 10:15:23,026 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.25 vs. limit=15.0 2023-11-19 10:15:30,843 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7150, loss[loss=0.09362, simple_loss=0.1209, pruned_loss=0.02473, audio_tagging_loss=0.008454, over 16285.00 frames. ], tot_loss[loss=0.08961, simple_loss=0.1093, pruned_loss=0.02452, audio_tagging_loss=0.01045, over 3043968.47 frames. ], batch size: 60, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:15:46,174 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688973.3333333334, ans=0.1 2023-11-19 10:15:47,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=688973.3333333334, ans=0.0 2023-11-19 10:15:48,210 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:16:20,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=689173.3333333334, ans=0.0 2023-11-19 10:16:26,359 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7200, loss[loss=0.06391, simple_loss=0.06807, pruned_loss=0.01235, audio_tagging_loss=0.01752, over 15126.00 frames. ], tot_loss[loss=0.08889, simple_loss=0.1083, pruned_loss=0.02416, audio_tagging_loss=0.01059, over 3044992.49 frames. ], batch size: 59, lr: 7.66e-03, grad_scale: 32.0 2023-11-19 10:16:50,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.339e+01 8.352e+01 9.080e+01 1.000e+02 1.342e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 10:17:02,082 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2023-11-19 10:17:08,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=689440.0, ans=0.125 2023-11-19 10:17:15,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=689506.6666666666, ans=0.1 2023-11-19 10:17:18,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=689506.6666666666, ans=0.025 2023-11-19 10:17:21,601 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7250, loss[loss=0.07409, simple_loss=0.08758, pruned_loss=0.01927, audio_tagging_loss=0.01103, over 15406.00 frames. ], tot_loss[loss=0.08796, simple_loss=0.1067, pruned_loss=0.02378, audio_tagging_loss=0.01082, over 3043599.88 frames. ], batch size: 61, lr: 7.66e-03, grad_scale: 32.0 2023-11-19 10:17:32,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=689640.0, ans=0.125 2023-11-19 10:17:33,553 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=689640.0, ans=0.0 2023-11-19 10:17:35,912 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2023-11-19 10:17:48,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=689706.6666666666, ans=0.125 2023-11-19 10:17:49,031 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=689706.6666666666, ans=15.0 2023-11-19 10:17:49,803 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=689706.6666666666, ans=0.07 2023-11-19 10:17:50,952 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=689706.6666666666, ans=0.125 2023-11-19 10:17:54,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=689773.3333333334, ans=0.015 2023-11-19 10:18:09,910 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=689840.0, ans=0.125 2023-11-19 10:18:17,659 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7300, loss[loss=0.05969, simple_loss=0.06958, pruned_loss=0.01182, audio_tagging_loss=0.01308, over 15003.00 frames. ], tot_loss[loss=0.08736, simple_loss=0.1059, pruned_loss=0.02361, audio_tagging_loss=0.01081, over 3040304.24 frames. ], batch size: 58, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:18:33,185 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=689973.3333333334, ans=0.0 2023-11-19 10:18:42,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.177e+01 8.479e+01 9.252e+01 1.452e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-19 10:19:09,648 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=690173.3333333334, ans=0.1 2023-11-19 10:19:12,532 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7350, loss[loss=0.0799, simple_loss=0.09604, pruned_loss=0.01913, audio_tagging_loss=0.01276, over 14979.00 frames. ], tot_loss[loss=0.08841, simple_loss=0.1072, pruned_loss=0.0242, audio_tagging_loss=0.01059, over 3047435.57 frames. ], batch size: 57, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:19:14,395 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690240.0, ans=0.1 2023-11-19 10:19:20,737 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:19:35,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=690373.3333333334, ans=0.125 2023-11-19 10:19:35,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=690373.3333333334, ans=15.0 2023-11-19 10:20:08,406 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7400, loss[loss=0.1046, simple_loss=0.1281, pruned_loss=0.03152, audio_tagging_loss=0.008979, over 15382.00 frames. ], tot_loss[loss=0.08727, simple_loss=0.1059, pruned_loss=0.02377, audio_tagging_loss=0.01055, over 3041897.80 frames. ], batch size: 59, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:20:26,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=690640.0, ans=0.125 2023-11-19 10:20:33,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=690706.6666666666, ans=0.2 2023-11-19 10:20:34,035 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.284e+01 9.235e+01 1.033e+02 1.364e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 10:20:34,339 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=690706.6666666666, ans=0.1 2023-11-19 10:20:41,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=690773.3333333334, ans=0.125 2023-11-19 10:20:45,476 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-11-19 10:20:51,447 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2023-11-19 10:20:52,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=690840.0, ans=0.125 2023-11-19 10:21:04,095 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7450, loss[loss=0.06689, simple_loss=0.08433, pruned_loss=0.01576, audio_tagging_loss=0.008963, over 14348.00 frames. ], tot_loss[loss=0.08716, simple_loss=0.1057, pruned_loss=0.02378, audio_tagging_loss=0.01054, over 3042892.80 frames. ], batch size: 55, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:21:08,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.10 vs. limit=22.5 2023-11-19 10:21:24,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=690973.3333333334, ans=0.125 2023-11-19 10:21:28,448 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=691040.0, ans=0.5 2023-11-19 10:21:30,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=691040.0, ans=0.07 2023-11-19 10:21:44,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=12.0 2023-11-19 10:21:57,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691173.3333333334, ans=0.1 2023-11-19 10:21:58,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=691240.0, ans=0.0 2023-11-19 10:21:59,373 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7500, loss[loss=0.08564, simple_loss=0.1024, pruned_loss=0.02181, audio_tagging_loss=0.01264, over 15849.00 frames. ], tot_loss[loss=0.08748, simple_loss=0.106, pruned_loss=0.02399, audio_tagging_loss=0.0105, over 3037923.16 frames. ], batch size: 60, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:22:00,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=691240.0, ans=0.125 2023-11-19 10:22:10,622 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=691306.6666666666, ans=10.0 2023-11-19 10:22:25,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.501e+01 8.537e+01 9.196e+01 9.974e+01 1.563e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 10:22:38,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=691440.0, ans=0.125 2023-11-19 10:22:50,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=691506.6666666666, ans=0.2 2023-11-19 10:22:54,768 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7550, loss[loss=0.08837, simple_loss=0.1109, pruned_loss=0.02537, audio_tagging_loss=0.007548, over 15528.00 frames. ], tot_loss[loss=0.08718, simple_loss=0.1058, pruned_loss=0.02378, audio_tagging_loss=0.01051, over 3041998.72 frames. ], batch size: 58, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:23:00,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=691573.3333333334, ans=0.125 2023-11-19 10:23:37,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=691773.3333333334, ans=0.125 2023-11-19 10:23:39,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=691840.0, ans=0.125 2023-11-19 10:23:50,733 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7600, loss[loss=0.06139, simple_loss=0.06605, pruned_loss=0.01273, audio_tagging_loss=0.01564, over 14542.00 frames. ], tot_loss[loss=0.08755, simple_loss=0.1067, pruned_loss=0.02382, audio_tagging_loss=0.01038, over 3037180.23 frames. ], batch size: 55, lr: 7.65e-03, grad_scale: 32.0 2023-11-19 10:24:10,118 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=691973.3333333334, ans=0.125 2023-11-19 10:24:16,202 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.367e+01 9.110e+01 1.007e+02 1.295e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 10:24:31,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=692106.6666666666, ans=0.07 2023-11-19 10:24:37,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692173.3333333334, ans=0.1 2023-11-19 10:24:46,411 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7650, loss[loss=0.07484, simple_loss=0.0938, pruned_loss=0.01989, audio_tagging_loss=0.008043, over 14173.00 frames. ], tot_loss[loss=0.0874, simple_loss=0.1065, pruned_loss=0.02378, audio_tagging_loss=0.01038, over 3041777.40 frames. ], batch size: 53, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:25:21,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=692440.0, ans=0.0 2023-11-19 10:25:28,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.17 vs. limit=22.5 2023-11-19 10:25:31,500 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.34 vs. limit=15.0 2023-11-19 10:25:42,036 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7700, loss[loss=0.08289, simple_loss=0.09355, pruned_loss=0.02567, audio_tagging_loss=0.01045, over 16107.00 frames. ], tot_loss[loss=0.08701, simple_loss=0.1061, pruned_loss=0.02359, audio_tagging_loss=0.01035, over 3037542.01 frames. ], batch size: 64, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:25:59,271 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=22.5 2023-11-19 10:26:02,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=692640.0, ans=0.125 2023-11-19 10:26:08,253 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.466e+01 9.076e+01 9.722e+01 1.155e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 10:26:17,123 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=692773.3333333334, ans=0.2 2023-11-19 10:26:31,919 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.82 vs. limit=12.0 2023-11-19 10:26:32,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=692840.0, ans=0.0 2023-11-19 10:26:38,229 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7750, loss[loss=0.1078, simple_loss=0.127, pruned_loss=0.0302, audio_tagging_loss=0.01411, over 14891.00 frames. ], tot_loss[loss=0.08714, simple_loss=0.1064, pruned_loss=0.02364, audio_tagging_loss=0.01032, over 3046072.05 frames. ], batch size: 56, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:27:11,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=693106.6666666666, ans=0.1 2023-11-19 10:27:12,169 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.12 vs. limit=22.5 2023-11-19 10:27:15,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=693106.6666666666, ans=0.0 2023-11-19 10:27:27,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=693173.3333333334, ans=0.2 2023-11-19 10:27:29,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=693173.3333333334, ans=0.125 2023-11-19 10:27:33,223 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7800, loss[loss=0.08194, simple_loss=0.09185, pruned_loss=0.02168, audio_tagging_loss=0.01434, over 14289.00 frames. ], tot_loss[loss=0.08811, simple_loss=0.1075, pruned_loss=0.02399, audio_tagging_loss=0.01035, over 3043172.73 frames. ], batch size: 52, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:28:02,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.622e+01 9.449e+01 1.060e+02 1.939e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-19 10:28:06,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=693373.3333333334, ans=0.125 2023-11-19 10:28:22,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=693506.6666666666, ans=0.2 2023-11-19 10:28:22,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693506.6666666666, ans=0.1 2023-11-19 10:28:27,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693506.6666666666, ans=0.1 2023-11-19 10:28:31,424 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7850, loss[loss=0.07981, simple_loss=0.09488, pruned_loss=0.02165, audio_tagging_loss=0.01072, over 14599.00 frames. ], tot_loss[loss=0.08758, simple_loss=0.1065, pruned_loss=0.02389, audio_tagging_loss=0.01042, over 3041315.81 frames. ], batch size: 55, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:28:38,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=693573.3333333334, ans=0.125 2023-11-19 10:28:39,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=693573.3333333334, ans=0.0 2023-11-19 10:29:00,337 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=693706.6666666666, ans=0.125 2023-11-19 10:29:03,722 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-19 10:29:16,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=693840.0, ans=0.125 2023-11-19 10:29:27,564 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7900, loss[loss=0.08736, simple_loss=0.1109, pruned_loss=0.02481, audio_tagging_loss=0.007087, over 14148.00 frames. ], tot_loss[loss=0.08802, simple_loss=0.1072, pruned_loss=0.024, audio_tagging_loss=0.01044, over 3043243.02 frames. ], batch size: 56, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:29:33,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2023-11-19 10:29:37,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=693973.3333333334, ans=0.2 2023-11-19 10:29:40,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=693973.3333333334, ans=0.0 2023-11-19 10:29:46,215 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=693973.3333333334, ans=0.125 2023-11-19 10:29:47,507 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2023-11-19 10:29:53,070 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694040.0, ans=0.1 2023-11-19 10:29:53,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=694040.0, ans=0.05 2023-11-19 10:29:53,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.460e+01 9.050e+01 1.000e+02 1.219e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 10:29:56,383 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:29:58,891 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.76 vs. limit=15.0 2023-11-19 10:30:04,112 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=694106.6666666666, ans=0.0 2023-11-19 10:30:07,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-11-19 10:30:12,171 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=694173.3333333334, ans=0.125 2023-11-19 10:30:15,883 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2023-11-19 10:30:16,777 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2023-11-19 10:30:21,691 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694240.0, ans=0.1 2023-11-19 10:30:22,402 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 7950, loss[loss=0.08266, simple_loss=0.095, pruned_loss=0.0249, audio_tagging_loss=0.01026, over 14116.00 frames. ], tot_loss[loss=0.08754, simple_loss=0.1062, pruned_loss=0.02384, audio_tagging_loss=0.01061, over 3042923.02 frames. ], batch size: 55, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:30:28,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=694240.0, ans=0.125 2023-11-19 10:30:36,304 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:30:41,110 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2023-11-19 10:31:06,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=694506.6666666666, ans=0.125 2023-11-19 10:31:06,225 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:31:10,961 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=22.5 2023-11-19 10:31:15,245 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:31:15,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=694506.6666666666, ans=0.2 2023-11-19 10:31:18,656 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8000, loss[loss=0.08736, simple_loss=0.1055, pruned_loss=0.02444, audio_tagging_loss=0.01018, over 15504.00 frames. ], tot_loss[loss=0.0868, simple_loss=0.105, pruned_loss=0.02358, audio_tagging_loss=0.01073, over 3040829.20 frames. ], batch size: 58, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:31:18,892 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=694573.3333333334, ans=0.09899494936611666 2023-11-19 10:31:26,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.40 vs. limit=10.0 2023-11-19 10:31:45,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.170e+01 9.028e+01 9.822e+01 2.160e+02, threshold=1.806e+02, percent-clipped=1.0 2023-11-19 10:31:50,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-11-19 10:32:13,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=694906.6666666666, ans=0.125 2023-11-19 10:32:14,654 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8050, loss[loss=0.07937, simple_loss=0.09901, pruned_loss=0.01943, audio_tagging_loss=0.01044, over 15594.00 frames. ], tot_loss[loss=0.08702, simple_loss=0.1052, pruned_loss=0.02367, audio_tagging_loss=0.01077, over 3037397.33 frames. ], batch size: 57, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:32:21,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694906.6666666666, ans=0.1 2023-11-19 10:32:43,312 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=695040.0, ans=0.125 2023-11-19 10:32:44,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=695040.0, ans=0.0 2023-11-19 10:32:56,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=695106.6666666666, ans=0.0 2023-11-19 10:32:56,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=695106.6666666666, ans=0.125 2023-11-19 10:33:08,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=695173.3333333334, ans=0.125 2023-11-19 10:33:10,046 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8100, loss[loss=0.0889, simple_loss=0.1012, pruned_loss=0.02774, audio_tagging_loss=0.01054, over 14180.00 frames. ], tot_loss[loss=0.08725, simple_loss=0.1057, pruned_loss=0.02373, audio_tagging_loss=0.01066, over 3044650.67 frames. ], batch size: 54, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:33:18,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=695240.0, ans=10.0 2023-11-19 10:33:18,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=695240.0, ans=0.07 2023-11-19 10:33:37,036 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.784e+01 8.319e+01 9.007e+01 9.983e+01 1.168e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-19 10:33:38,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=695373.3333333334, ans=0.2 2023-11-19 10:33:53,502 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:34:00,184 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.11 vs. limit=22.5 2023-11-19 10:34:01,272 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.06 vs. limit=15.0 2023-11-19 10:34:05,390 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8150, loss[loss=0.05269, simple_loss=0.05564, pruned_loss=0.01108, audio_tagging_loss=0.01379, over 16861.00 frames. ], tot_loss[loss=0.0874, simple_loss=0.106, pruned_loss=0.02384, audio_tagging_loss=0.01054, over 3040921.58 frames. ], batch size: 65, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:34:10,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=695573.3333333334, ans=0.125 2023-11-19 10:34:29,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=695706.6666666666, ans=0.2 2023-11-19 10:34:31,397 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=695706.6666666666, ans=0.125 2023-11-19 10:34:31,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=695706.6666666666, ans=0.125 2023-11-19 10:34:37,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=695773.3333333334, ans=0.125 2023-11-19 10:34:40,814 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=695773.3333333334, ans=0.2 2023-11-19 10:34:45,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=695773.3333333334, ans=0.0 2023-11-19 10:34:48,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=695840.0, ans=10.0 2023-11-19 10:34:50,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=695840.0, ans=0.0 2023-11-19 10:34:55,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-11-19 10:35:01,175 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8200, loss[loss=0.07288, simple_loss=0.08548, pruned_loss=0.01955, audio_tagging_loss=0.0106, over 15026.00 frames. ], tot_loss[loss=0.08735, simple_loss=0.1059, pruned_loss=0.02395, audio_tagging_loss=0.01044, over 3037537.05 frames. ], batch size: 59, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:35:02,259 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:35:07,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=695906.6666666666, ans=0.09899494936611666 2023-11-19 10:35:09,567 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2023-11-19 10:35:10,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=695906.6666666666, ans=0.1 2023-11-19 10:35:27,078 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.672e+01 8.400e+01 8.844e+01 9.876e+01 1.152e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 10:35:31,104 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=696040.0, ans=0.125 2023-11-19 10:35:38,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=696106.6666666666, ans=0.125 2023-11-19 10:35:50,546 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=696173.3333333334, ans=0.0 2023-11-19 10:35:56,650 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8250, loss[loss=0.09297, simple_loss=0.1041, pruned_loss=0.0302, audio_tagging_loss=0.01071, over 14011.00 frames. ], tot_loss[loss=0.08814, simple_loss=0.1073, pruned_loss=0.02421, audio_tagging_loss=0.0103, over 3039098.60 frames. ], batch size: 55, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:36:06,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=696306.6666666666, ans=0.125 2023-11-19 10:36:50,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=696573.3333333334, ans=0.2 2023-11-19 10:36:52,158 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8300, loss[loss=0.08384, simple_loss=0.09268, pruned_loss=0.02335, audio_tagging_loss=0.01415, over 14451.00 frames. ], tot_loss[loss=0.0887, simple_loss=0.1077, pruned_loss=0.02446, audio_tagging_loss=0.01037, over 3047290.56 frames. ], batch size: 54, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:36:52,412 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:36:56,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=696573.3333333334, ans=0.0 2023-11-19 10:37:14,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=696706.6666666666, ans=0.125 2023-11-19 10:37:18,735 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.396e+01 9.218e+01 1.018e+02 1.275e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 10:37:23,156 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=696706.6666666666, ans=0.125 2023-11-19 10:37:34,281 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=696773.3333333334, ans=0.0 2023-11-19 10:37:41,013 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.11 vs. limit=10.0 2023-11-19 10:37:47,186 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8350, loss[loss=0.09694, simple_loss=0.1138, pruned_loss=0.0309, audio_tagging_loss=0.009112, over 14308.00 frames. ], tot_loss[loss=0.08819, simple_loss=0.1073, pruned_loss=0.02417, audio_tagging_loss=0.01037, over 3049650.15 frames. ], batch size: 56, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:37:48,498 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=696906.6666666666, ans=0.125 2023-11-19 10:37:55,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=696906.6666666666, ans=0.125 2023-11-19 10:38:02,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=696973.3333333334, ans=0.04949747468305833 2023-11-19 10:38:11,132 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2023-11-19 10:38:19,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=697106.6666666666, ans=0.125 2023-11-19 10:38:35,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-19 10:38:41,272 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=697173.3333333334, ans=0.0 2023-11-19 10:38:43,143 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8400, loss[loss=0.08856, simple_loss=0.1085, pruned_loss=0.02321, audio_tagging_loss=0.01109, over 15133.00 frames. ], tot_loss[loss=0.08831, simple_loss=0.1076, pruned_loss=0.02425, audio_tagging_loss=0.01027, over 3051262.66 frames. ], batch size: 58, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:38:47,823 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2023-11-19 10:39:09,204 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.184e+01 9.115e+01 9.863e+01 1.459e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 10:39:31,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697506.6666666666, ans=0.1 2023-11-19 10:39:34,771 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=697506.6666666666, ans=0.07 2023-11-19 10:39:37,766 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8450, loss[loss=0.1046, simple_loss=0.132, pruned_loss=0.03083, audio_tagging_loss=0.007735, over 14943.00 frames. ], tot_loss[loss=0.0873, simple_loss=0.1063, pruned_loss=0.02386, audio_tagging_loss=0.01028, over 3054657.37 frames. ], batch size: 56, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:39:54,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=697640.0, ans=0.125 2023-11-19 10:40:01,917 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.14 vs. limit=22.5 2023-11-19 10:40:10,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=697773.3333333334, ans=0.125 2023-11-19 10:40:14,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=697773.3333333334, ans=0.125 2023-11-19 10:40:25,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=697840.0, ans=0.0 2023-11-19 10:40:33,449 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8500, loss[loss=0.07977, simple_loss=0.0961, pruned_loss=0.02204, audio_tagging_loss=0.009684, over 14535.00 frames. ], tot_loss[loss=0.08618, simple_loss=0.1045, pruned_loss=0.02339, audio_tagging_loss=0.01052, over 3054837.94 frames. ], batch size: 58, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:40:38,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=697906.6666666666, ans=0.125 2023-11-19 10:40:59,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.736e+01 1.015e+02 1.178e+02 2.396e+02, threshold=2.030e+02, percent-clipped=2.0 2023-11-19 10:41:04,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=698040.0, ans=0.125 2023-11-19 10:41:18,186 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=698173.3333333334, ans=0.125 2023-11-19 10:41:18,523 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.95 vs. limit=22.5 2023-11-19 10:41:28,735 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=12.0 2023-11-19 10:41:29,463 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8550, loss[loss=0.09778, simple_loss=0.1183, pruned_loss=0.02707, audio_tagging_loss=0.01157, over 14457.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1052, pruned_loss=0.02351, audio_tagging_loss=0.01047, over 3055865.97 frames. ], batch size: 57, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:41:29,733 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=698240.0, ans=0.125 2023-11-19 10:41:35,959 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=698240.0, ans=0.05 2023-11-19 10:42:04,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=698440.0, ans=0.125 2023-11-19 10:42:21,838 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=698506.6666666666, ans=0.125 2023-11-19 10:42:23,876 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8600, loss[loss=0.09119, simple_loss=0.1043, pruned_loss=0.02299, audio_tagging_loss=0.01605, over 16729.00 frames. ], tot_loss[loss=0.08715, simple_loss=0.1058, pruned_loss=0.02368, audio_tagging_loss=0.01059, over 3055144.29 frames. ], batch size: 62, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:42:39,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=698640.0, ans=0.125 2023-11-19 10:42:50,809 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.414e+01 9.085e+01 1.004e+02 1.428e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 10:43:06,719 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.12 vs. limit=22.5 2023-11-19 10:43:09,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2023-11-19 10:43:17,077 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.66 vs. limit=22.5 2023-11-19 10:43:19,389 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8650, loss[loss=0.08948, simple_loss=0.1151, pruned_loss=0.02292, audio_tagging_loss=0.00902, over 14796.00 frames. ], tot_loss[loss=0.0876, simple_loss=0.1066, pruned_loss=0.02371, audio_tagging_loss=0.0106, over 3056447.37 frames. ], batch size: 57, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:43:23,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=698906.6666666666, ans=0.125 2023-11-19 10:43:23,376 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=698906.6666666666, ans=0.07 2023-11-19 10:43:55,734 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.13 vs. limit=15.0 2023-11-19 10:44:09,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=699173.3333333334, ans=0.0 2023-11-19 10:44:15,095 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8700, loss[loss=0.1086, simple_loss=0.1346, pruned_loss=0.03405, audio_tagging_loss=0.007267, over 14939.00 frames. ], tot_loss[loss=0.08789, simple_loss=0.107, pruned_loss=0.02385, audio_tagging_loss=0.01056, over 3063257.03 frames. ], batch size: 56, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:44:23,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2023-11-19 10:44:26,853 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.20 vs. limit=15.0 2023-11-19 10:44:41,442 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.409e+01 9.264e+01 1.013e+02 1.808e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-19 10:44:45,936 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2023-11-19 10:45:03,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=699506.6666666666, ans=0.1 2023-11-19 10:45:10,501 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8750, loss[loss=0.07993, simple_loss=0.09928, pruned_loss=0.01743, audio_tagging_loss=0.01285, over 14614.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1072, pruned_loss=0.02387, audio_tagging_loss=0.0107, over 3056001.90 frames. ], batch size: 56, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:45:36,500 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699706.6666666666, ans=0.1 2023-11-19 10:45:44,211 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-19 10:45:47,132 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=699773.3333333334, ans=0.125 2023-11-19 10:45:55,093 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.08 vs. limit=22.5 2023-11-19 10:46:05,641 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8800, loss[loss=0.1037, simple_loss=0.1193, pruned_loss=0.03075, audio_tagging_loss=0.01334, over 16576.00 frames. ], tot_loss[loss=0.08835, simple_loss=0.1074, pruned_loss=0.02393, audio_tagging_loss=0.01074, over 3052710.18 frames. ], batch size: 62, lr: 7.60e-03, grad_scale: 32.0 2023-11-19 10:46:33,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.476e+01 9.103e+01 9.978e+01 1.212e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-19 10:46:52,949 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=700173.3333333334, ans=0.125 2023-11-19 10:47:01,760 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8850, loss[loss=0.07534, simple_loss=0.09388, pruned_loss=0.01918, audio_tagging_loss=0.009214, over 15846.00 frames. ], tot_loss[loss=0.08757, simple_loss=0.1064, pruned_loss=0.0236, audio_tagging_loss=0.01078, over 3060327.49 frames. ], batch size: 58, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:47:01,907 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=700240.0, ans=0.035 2023-11-19 10:47:06,133 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=700240.0, ans=0.125 2023-11-19 10:47:12,252 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:47:33,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=700440.0, ans=0.2 2023-11-19 10:47:44,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-19 10:47:56,185 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8900, loss[loss=0.06544, simple_loss=0.0648, pruned_loss=0.01976, audio_tagging_loss=0.01328, over 14631.00 frames. ], tot_loss[loss=0.08729, simple_loss=0.1064, pruned_loss=0.02356, audio_tagging_loss=0.01054, over 3059422.59 frames. ], batch size: 58, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:48:04,989 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=700573.3333333334, ans=0.125 2023-11-19 10:48:12,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=700640.0, ans=0.2 2023-11-19 10:48:25,773 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.384e+01 9.220e+01 1.025e+02 1.340e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 10:48:52,103 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 8950, loss[loss=0.08453, simple_loss=0.1012, pruned_loss=0.02186, audio_tagging_loss=0.01206, over 14511.00 frames. ], tot_loss[loss=0.08747, simple_loss=0.1069, pruned_loss=0.02365, audio_tagging_loss=0.01035, over 3059157.37 frames. ], batch size: 55, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:48:56,738 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.43 vs. limit=22.5 2023-11-19 10:48:56,747 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.00 vs. limit=15.0 2023-11-19 10:49:00,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=700906.6666666666, ans=0.09899494936611666 2023-11-19 10:49:19,120 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2023-11-19 10:49:34,343 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=701106.6666666666, ans=0.125 2023-11-19 10:49:37,665 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=701173.3333333334, ans=0.125 2023-11-19 10:49:47,834 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9000, loss[loss=0.1027, simple_loss=0.13, pruned_loss=0.02866, audio_tagging_loss=0.009025, over 14592.00 frames. ], tot_loss[loss=0.08753, simple_loss=0.1073, pruned_loss=0.02361, audio_tagging_loss=0.01025, over 3058247.40 frames. ], batch size: 57, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:49:47,835 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 10:50:09,279 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.9929, 3.1262, 2.7386, 2.6908, 3.5402, 3.6764, 2.9023, 3.7050], device='cuda:3') 2023-11-19 10:50:20,508 INFO [train_asr.py:1147] (3/4) Epoch 9, validation: loss=0.06655, simple_loss=0.05588, pruned_loss=0.006694, audio_tagging_loss=0.03192, over 4681554.00 frames. 2023-11-19 10:50:20,508 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 10:50:40,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=701306.6666666666, ans=0.2 2023-11-19 10:50:50,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.217e+01 9.187e+01 1.011e+02 1.342e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-19 10:51:00,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=701440.0, ans=0.09899494936611666 2023-11-19 10:51:11,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2023-11-19 10:51:16,390 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9050, loss[loss=0.094, simple_loss=0.1223, pruned_loss=0.02532, audio_tagging_loss=0.007537, over 15813.00 frames. ], tot_loss[loss=0.088, simple_loss=0.1081, pruned_loss=0.02378, audio_tagging_loss=0.01015, over 3054935.16 frames. ], batch size: 59, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:51:21,629 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2023-11-19 10:51:24,437 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=701573.3333333334, ans=0.125 2023-11-19 10:51:30,588 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2023-11-19 10:51:31,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=701640.0, ans=0.025 2023-11-19 10:51:33,534 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=701640.0, ans=0.1 2023-11-19 10:51:53,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=701773.3333333334, ans=0.2 2023-11-19 10:52:01,710 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2023-11-19 10:52:12,355 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9100, loss[loss=0.1037, simple_loss=0.1247, pruned_loss=0.03214, audio_tagging_loss=0.009206, over 14710.00 frames. ], tot_loss[loss=0.08778, simple_loss=0.1078, pruned_loss=0.02367, audio_tagging_loss=0.01021, over 3060487.64 frames. ], batch size: 54, lr: 7.59e-03, grad_scale: 8.0 2023-11-19 10:52:19,248 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2023-11-19 10:52:28,405 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2023-11-19 10:52:36,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=702040.0, ans=0.125 2023-11-19 10:52:41,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.498e+01 9.044e+01 9.975e+01 1.337e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 10:52:48,994 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=702106.6666666666, ans=0.125 2023-11-19 10:53:02,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=12.0 2023-11-19 10:53:07,270 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9150, loss[loss=0.08106, simple_loss=0.09695, pruned_loss=0.02164, audio_tagging_loss=0.01095, over 15527.00 frames. ], tot_loss[loss=0.08656, simple_loss=0.1061, pruned_loss=0.0233, audio_tagging_loss=0.01019, over 3057553.70 frames. ], batch size: 56, lr: 7.59e-03, grad_scale: 8.0 2023-11-19 10:53:10,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=702240.0, ans=0.125 2023-11-19 10:53:28,458 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.04 vs. limit=6.0 2023-11-19 10:53:37,342 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=22.5 2023-11-19 10:53:49,685 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=702440.0, ans=0.125 2023-11-19 10:53:52,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=702506.6666666666, ans=0.0 2023-11-19 10:53:58,400 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.99 vs. limit=22.5 2023-11-19 10:54:02,855 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9200, loss[loss=0.08614, simple_loss=0.119, pruned_loss=0.01927, audio_tagging_loss=0.007391, over 15385.00 frames. ], tot_loss[loss=0.08618, simple_loss=0.1054, pruned_loss=0.02319, audio_tagging_loss=0.0103, over 3058067.20 frames. ], batch size: 58, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:54:02,926 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=702573.3333333334, ans=0.125 2023-11-19 10:54:04,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=702573.3333333334, ans=0.125 2023-11-19 10:54:23,134 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702640.0, ans=0.1 2023-11-19 10:54:33,448 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.309e+01 9.173e+01 1.001e+02 1.258e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 10:54:54,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=12.0 2023-11-19 10:54:59,966 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9250, loss[loss=0.07193, simple_loss=0.08997, pruned_loss=0.01748, audio_tagging_loss=0.009476, over 15921.00 frames. ], tot_loss[loss=0.08665, simple_loss=0.1061, pruned_loss=0.02337, audio_tagging_loss=0.01023, over 3065356.51 frames. ], batch size: 59, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:55:02,247 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:55:28,081 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:55:29,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=703040.0, ans=0.2 2023-11-19 10:55:40,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=703106.6666666666, ans=10.0 2023-11-19 10:55:41,828 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=703106.6666666666, ans=0.025 2023-11-19 10:55:54,240 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9300, loss[loss=0.1032, simple_loss=0.1269, pruned_loss=0.03047, audio_tagging_loss=0.009256, over 16039.00 frames. ], tot_loss[loss=0.08695, simple_loss=0.1063, pruned_loss=0.02345, audio_tagging_loss=0.01034, over 3067087.23 frames. ], batch size: 57, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:56:07,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2023-11-19 10:56:13,921 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.14 vs. limit=10.0 2023-11-19 10:56:22,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=703373.3333333334, ans=0.125 2023-11-19 10:56:25,077 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.507e+01 9.203e+01 9.999e+01 1.405e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 10:56:28,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=703440.0, ans=0.125 2023-11-19 10:56:50,080 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9350, loss[loss=0.08162, simple_loss=0.1051, pruned_loss=0.02035, audio_tagging_loss=0.008714, over 14579.00 frames. ], tot_loss[loss=0.08748, simple_loss=0.1069, pruned_loss=0.02373, audio_tagging_loss=0.01028, over 3065108.09 frames. ], batch size: 55, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:56:55,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=703573.3333333334, ans=0.2 2023-11-19 10:57:05,071 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-11-19 10:57:10,823 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=703640.0, ans=0.1 2023-11-19 10:57:23,021 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=703773.3333333334, ans=0.1 2023-11-19 10:57:23,059 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=703773.3333333334, ans=0.125 2023-11-19 10:57:24,092 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:57:40,660 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2023-11-19 10:57:46,474 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9400, loss[loss=0.07929, simple_loss=0.1021, pruned_loss=0.01877, audio_tagging_loss=0.009464, over 15643.00 frames. ], tot_loss[loss=0.08747, simple_loss=0.1068, pruned_loss=0.0237, audio_tagging_loss=0.01039, over 3070600.77 frames. ], batch size: 57, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:57:50,208 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2023-11-19 10:57:52,084 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=703906.6666666666, ans=0.0 2023-11-19 10:57:56,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=703973.3333333334, ans=0.2 2023-11-19 10:58:07,563 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=704040.0, ans=0.0 2023-11-19 10:58:13,930 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=704040.0, ans=0.05 2023-11-19 10:58:15,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.579e+01 9.388e+01 1.029e+02 1.331e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 10:58:39,700 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:58:41,781 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9450, loss[loss=0.07263, simple_loss=0.08431, pruned_loss=0.01787, audio_tagging_loss=0.0126, over 14513.00 frames. ], tot_loss[loss=0.08774, simple_loss=0.1068, pruned_loss=0.0237, audio_tagging_loss=0.01062, over 3064887.66 frames. ], batch size: 56, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:59:13,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=704373.3333333334, ans=10.0 2023-11-19 10:59:19,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=704440.0, ans=0.1 2023-11-19 10:59:33,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=704506.6666666666, ans=0.07 2023-11-19 10:59:36,678 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9500, loss[loss=0.08165, simple_loss=0.09584, pruned_loss=0.02505, audio_tagging_loss=0.008679, over 14808.00 frames. ], tot_loss[loss=0.08767, simple_loss=0.1067, pruned_loss=0.02364, audio_tagging_loss=0.01067, over 3061018.39 frames. ], batch size: 57, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:59:47,123 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2023-11-19 10:59:51,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=704640.0, ans=0.1 2023-11-19 11:00:02,826 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:00:02,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2023-11-19 11:00:06,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.471e+01 8.371e+01 9.001e+01 9.881e+01 1.664e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 11:00:07,025 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=704706.6666666666, ans=0.0 2023-11-19 11:00:17,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=704773.3333333334, ans=0.05 2023-11-19 11:00:18,284 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.47 vs. limit=15.0 2023-11-19 11:00:30,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704840.0, ans=0.1 2023-11-19 11:00:32,114 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9550, loss[loss=0.09873, simple_loss=0.1217, pruned_loss=0.02726, audio_tagging_loss=0.01064, over 15253.00 frames. ], tot_loss[loss=0.08746, simple_loss=0.1062, pruned_loss=0.02369, audio_tagging_loss=0.01067, over 3057587.23 frames. ], batch size: 56, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 11:00:35,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=704906.6666666666, ans=0.125 2023-11-19 11:00:41,939 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=704906.6666666666, ans=0.0 2023-11-19 11:00:50,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=704973.3333333334, ans=0.125 2023-11-19 11:01:11,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705106.6666666666, ans=0.1 2023-11-19 11:01:28,004 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9600, loss[loss=0.0937, simple_loss=0.1143, pruned_loss=0.02473, audio_tagging_loss=0.01182, over 15810.00 frames. ], tot_loss[loss=0.08722, simple_loss=0.1061, pruned_loss=0.02347, audio_tagging_loss=0.01072, over 3052926.29 frames. ], batch size: 59, lr: 7.58e-03, grad_scale: 32.0 2023-11-19 11:01:36,239 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:01:39,301 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=705306.6666666666, ans=0.0 2023-11-19 11:01:58,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.539e+01 8.373e+01 9.061e+01 9.893e+01 1.304e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 11:02:00,656 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=705440.0, ans=0.125 2023-11-19 11:02:07,426 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=705440.0, ans=0.125 2023-11-19 11:02:11,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=705506.6666666666, ans=0.0 2023-11-19 11:02:23,456 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9650, loss[loss=0.09293, simple_loss=0.1097, pruned_loss=0.02684, audio_tagging_loss=0.01124, over 14953.00 frames. ], tot_loss[loss=0.08726, simple_loss=0.1061, pruned_loss=0.02358, audio_tagging_loss=0.01063, over 3047183.05 frames. ], batch size: 56, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:02:36,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=705640.0, ans=10.0 2023-11-19 11:02:38,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=705640.0, ans=15.0 2023-11-19 11:02:51,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=705706.6666666666, ans=0.125 2023-11-19 11:02:56,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=705773.3333333334, ans=0.2 2023-11-19 11:03:17,162 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2023-11-19 11:03:18,572 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9700, loss[loss=0.09601, simple_loss=0.1279, pruned_loss=0.0237, audio_tagging_loss=0.008337, over 15038.00 frames. ], tot_loss[loss=0.08782, simple_loss=0.1072, pruned_loss=0.02384, audio_tagging_loss=0.01038, over 3042656.88 frames. ], batch size: 57, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:03:22,568 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=705906.6666666666, ans=0.05 2023-11-19 11:03:29,897 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:03:35,593 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=705973.3333333334, ans=0.125 2023-11-19 11:03:46,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=706040.0, ans=0.125 2023-11-19 11:03:48,692 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 8.495e+01 9.250e+01 1.013e+02 1.601e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 11:03:53,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=22.5 2023-11-19 11:03:59,919 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=706106.6666666666, ans=0.125 2023-11-19 11:04:09,245 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-11-19 11:04:11,085 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=706173.3333333334, ans=0.125 2023-11-19 11:04:13,978 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9750, loss[loss=0.09669, simple_loss=0.1184, pruned_loss=0.02761, audio_tagging_loss=0.009898, over 15269.00 frames. ], tot_loss[loss=0.08806, simple_loss=0.1079, pruned_loss=0.02384, audio_tagging_loss=0.01029, over 3042594.84 frames. ], batch size: 56, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:04:22,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=706240.0, ans=0.125 2023-11-19 11:04:36,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=706373.3333333334, ans=0.125 2023-11-19 11:04:50,073 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=706440.0, ans=0.0 2023-11-19 11:04:52,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=706440.0, ans=0.125 2023-11-19 11:04:53,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=706440.0, ans=0.125 2023-11-19 11:04:54,255 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=706440.0, ans=0.125 2023-11-19 11:05:04,636 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.42 vs. limit=22.5 2023-11-19 11:05:05,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=706506.6666666666, ans=0.2 2023-11-19 11:05:06,827 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=706506.6666666666, ans=0.125 2023-11-19 11:05:09,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=706573.3333333334, ans=0.0 2023-11-19 11:05:09,876 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9800, loss[loss=0.1089, simple_loss=0.1364, pruned_loss=0.03088, audio_tagging_loss=0.009836, over 15323.00 frames. ], tot_loss[loss=0.08687, simple_loss=0.1061, pruned_loss=0.02354, audio_tagging_loss=0.01027, over 3042639.21 frames. ], batch size: 58, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:05:37,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706706.6666666666, ans=0.1 2023-11-19 11:05:39,679 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.318e+01 9.223e+01 1.040e+02 1.482e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 11:05:53,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=706840.0, ans=0.1 2023-11-19 11:05:55,190 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=22.5 2023-11-19 11:05:59,822 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:06:05,686 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9850, loss[loss=0.07196, simple_loss=0.08597, pruned_loss=0.01973, audio_tagging_loss=0.009256, over 15293.00 frames. ], tot_loss[loss=0.08681, simple_loss=0.1063, pruned_loss=0.02349, audio_tagging_loss=0.01018, over 3042938.77 frames. ], batch size: 60, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:06:11,079 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=706906.6666666666, ans=0.0 2023-11-19 11:06:16,437 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2023-11-19 11:06:16,456 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.70 vs. limit=5.0 2023-11-19 11:06:34,842 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=707040.0, ans=0.125 2023-11-19 11:06:49,187 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:06:51,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=707173.3333333334, ans=0.1 2023-11-19 11:06:52,599 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.79 vs. limit=10.0 2023-11-19 11:06:58,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=707173.3333333334, ans=0.0 2023-11-19 11:07:01,177 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9900, loss[loss=0.1069, simple_loss=0.1243, pruned_loss=0.03101, audio_tagging_loss=0.01371, over 13835.00 frames. ], tot_loss[loss=0.08725, simple_loss=0.1067, pruned_loss=0.02374, audio_tagging_loss=0.01015, over 3041060.83 frames. ], batch size: 53, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:07:01,413 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707240.0, ans=0.1 2023-11-19 11:07:09,911 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=707240.0, ans=0.0 2023-11-19 11:07:31,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.468e+01 8.507e+01 9.296e+01 1.006e+02 1.418e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 11:07:33,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=707440.0, ans=0.0 2023-11-19 11:07:34,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=707440.0, ans=0.0 2023-11-19 11:07:46,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-11-19 11:07:50,383 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=12.0 2023-11-19 11:07:56,768 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-11-19 11:07:57,332 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 9950, loss[loss=0.0918, simple_loss=0.1109, pruned_loss=0.02564, audio_tagging_loss=0.01071, over 14321.00 frames. ], tot_loss[loss=0.08736, simple_loss=0.1065, pruned_loss=0.02382, audio_tagging_loss=0.01026, over 3050256.00 frames. ], batch size: 55, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:08:07,913 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=707640.0, ans=0.0 2023-11-19 11:08:19,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=707706.6666666666, ans=0.125 2023-11-19 11:08:21,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2023-11-19 11:08:30,378 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=707773.3333333334, ans=0.1 2023-11-19 11:08:42,292 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.89 vs. limit=12.0 2023-11-19 11:08:52,383 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10000, loss[loss=0.08278, simple_loss=0.09428, pruned_loss=0.02306, audio_tagging_loss=0.01258, over 15272.00 frames. ], tot_loss[loss=0.08635, simple_loss=0.1051, pruned_loss=0.02345, audio_tagging_loss=0.01034, over 3049499.39 frames. ], batch size: 60, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:08:58,379 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=707906.6666666666, ans=0.125 2023-11-19 11:09:21,498 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=8.0 2023-11-19 11:09:23,371 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 8.309e+01 8.896e+01 1.009e+02 1.315e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 11:09:29,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=708106.6666666666, ans=0.5 2023-11-19 11:09:38,157 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=708173.3333333334, ans=0.2 2023-11-19 11:09:49,099 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10050, loss[loss=0.09765, simple_loss=0.119, pruned_loss=0.02338, audio_tagging_loss=0.01478, over 15655.00 frames. ], tot_loss[loss=0.08687, simple_loss=0.1057, pruned_loss=0.02362, audio_tagging_loss=0.0104, over 3047792.15 frames. ], batch size: 58, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:09:51,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708240.0, ans=0.1 2023-11-19 11:10:02,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.13 vs. limit=10.0 2023-11-19 11:10:06,742 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=708306.6666666666, ans=0.0 2023-11-19 11:10:12,690 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=708373.3333333334, ans=0.0 2023-11-19 11:10:24,826 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=708440.0, ans=0.125 2023-11-19 11:10:44,153 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10100, loss[loss=0.08294, simple_loss=0.09413, pruned_loss=0.02142, audio_tagging_loss=0.01445, over 15179.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.1051, pruned_loss=0.02342, audio_tagging_loss=0.0105, over 3053296.54 frames. ], batch size: 56, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:10:51,763 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=708573.3333333334, ans=0.0 2023-11-19 11:11:15,749 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.397e+01 8.991e+01 1.000e+02 1.217e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 11:11:20,577 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.24 vs. limit=12.0 2023-11-19 11:11:29,055 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:11:40,086 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10150, loss[loss=0.05843, simple_loss=0.06481, pruned_loss=0.01349, audio_tagging_loss=0.01253, over 15132.00 frames. ], tot_loss[loss=0.08652, simple_loss=0.1049, pruned_loss=0.0235, audio_tagging_loss=0.0106, over 3053694.74 frames. ], batch size: 59, lr: 7.56e-03, grad_scale: 16.0 2023-11-19 11:11:42,385 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=708906.6666666666, ans=0.0 2023-11-19 11:12:05,490 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:12:05,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709040.0, ans=0.1 2023-11-19 11:12:16,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=709106.6666666666, ans=0.2 2023-11-19 11:12:20,692 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2023-11-19 11:12:22,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=709106.6666666666, ans=0.125 2023-11-19 11:12:29,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=709173.3333333334, ans=0.0 2023-11-19 11:12:31,438 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=709173.3333333334, ans=0.125 2023-11-19 11:12:33,099 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:12:36,057 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10200, loss[loss=0.08385, simple_loss=0.1026, pruned_loss=0.02336, audio_tagging_loss=0.009195, over 15151.00 frames. ], tot_loss[loss=0.08727, simple_loss=0.1061, pruned_loss=0.02367, audio_tagging_loss=0.01057, over 3054083.54 frames. ], batch size: 57, lr: 7.56e-03, grad_scale: 16.0 2023-11-19 11:12:40,468 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=709240.0, ans=0.125 2023-11-19 11:12:40,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=709240.0, ans=0.125 2023-11-19 11:12:41,952 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.29 vs. limit=15.0 2023-11-19 11:12:54,567 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=709306.6666666666, ans=0.125 2023-11-19 11:12:56,600 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:13:03,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=709373.3333333334, ans=0.125 2023-11-19 11:13:06,567 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.611e+01 9.302e+01 1.032e+02 1.464e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 11:13:30,928 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10250, loss[loss=0.08016, simple_loss=0.106, pruned_loss=0.01896, audio_tagging_loss=0.008172, over 15298.00 frames. ], tot_loss[loss=0.08675, simple_loss=0.1053, pruned_loss=0.02345, audio_tagging_loss=0.01065, over 3050254.29 frames. ], batch size: 56, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:13:40,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=709573.3333333334, ans=0.0 2023-11-19 11:13:49,482 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2023-11-19 11:13:50,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=709640.0, ans=0.0 2023-11-19 11:13:53,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.87 vs. limit=15.0 2023-11-19 11:14:03,174 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-11-19 11:14:16,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=709840.0, ans=0.125 2023-11-19 11:14:26,446 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10300, loss[loss=0.08772, simple_loss=0.101, pruned_loss=0.0238, audio_tagging_loss=0.0134, over 15555.00 frames. ], tot_loss[loss=0.08714, simple_loss=0.1056, pruned_loss=0.02363, audio_tagging_loss=0.01072, over 3055570.78 frames. ], batch size: 59, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:14:30,349 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=709906.6666666666, ans=0.0 2023-11-19 11:14:40,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=709973.3333333334, ans=0.125 2023-11-19 11:14:47,771 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2023-11-19 11:14:58,177 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.163e+01 8.880e+01 9.962e+01 1.363e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-19 11:15:07,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=710106.6666666666, ans=0.125 2023-11-19 11:15:23,157 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10350, loss[loss=0.0871, simple_loss=0.1045, pruned_loss=0.02194, audio_tagging_loss=0.01292, over 17194.00 frames. ], tot_loss[loss=0.08799, simple_loss=0.1066, pruned_loss=0.02386, audio_tagging_loss=0.01082, over 3052286.66 frames. ], batch size: 64, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:15:27,574 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710240.0, ans=0.1 2023-11-19 11:15:39,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=710306.6666666666, ans=0.0 2023-11-19 11:15:47,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=710373.3333333334, ans=0.125 2023-11-19 11:16:00,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=710440.0, ans=0.0 2023-11-19 11:16:00,081 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=710440.0, ans=0.125 2023-11-19 11:16:03,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=710440.0, ans=0.05 2023-11-19 11:16:11,064 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=710506.6666666666, ans=0.0 2023-11-19 11:16:17,481 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=710573.3333333334, ans=0.125 2023-11-19 11:16:18,292 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10400, loss[loss=0.09172, simple_loss=0.1203, pruned_loss=0.02103, audio_tagging_loss=0.01054, over 15892.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1069, pruned_loss=0.02391, audio_tagging_loss=0.0108, over 3045543.40 frames. ], batch size: 59, lr: 7.55e-03, grad_scale: 32.0 2023-11-19 11:16:18,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=710573.3333333334, ans=0.2 2023-11-19 11:16:37,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=710640.0, ans=0.125 2023-11-19 11:16:42,601 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=710706.6666666666, ans=0.125 2023-11-19 11:16:43,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=710706.6666666666, ans=0.125 2023-11-19 11:16:50,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.272e+01 8.278e+01 9.028e+01 1.000e+02 1.286e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 11:16:52,735 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=710773.3333333334, ans=0.125 2023-11-19 11:17:11,708 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2023-11-19 11:17:14,233 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10450, loss[loss=0.06821, simple_loss=0.08546, pruned_loss=0.01456, audio_tagging_loss=0.01092, over 14480.00 frames. ], tot_loss[loss=0.0869, simple_loss=0.1051, pruned_loss=0.02351, audio_tagging_loss=0.01084, over 3039115.30 frames. ], batch size: 54, lr: 7.55e-03, grad_scale: 32.0 2023-11-19 11:17:17,580 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=710906.6666666666, ans=0.0 2023-11-19 11:17:23,811 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.54 vs. limit=22.5 2023-11-19 11:18:03,905 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:18:10,453 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10500, loss[loss=0.09852, simple_loss=0.117, pruned_loss=0.02863, audio_tagging_loss=0.0114, over 15594.00 frames. ], tot_loss[loss=0.08671, simple_loss=0.1052, pruned_loss=0.02343, audio_tagging_loss=0.01066, over 3040253.27 frames. ], batch size: 60, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:18:19,700 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=711240.0, ans=0.2 2023-11-19 11:18:23,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2023-11-19 11:18:23,961 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=711306.6666666666, ans=0.05 2023-11-19 11:18:27,092 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=711306.6666666666, ans=0.125 2023-11-19 11:18:27,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=711306.6666666666, ans=0.0 2023-11-19 11:18:27,413 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-19 11:18:30,731 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=711306.6666666666, ans=0.2 2023-11-19 11:18:38,355 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=22.5 2023-11-19 11:18:41,179 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.288e+01 9.011e+01 9.876e+01 1.227e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 11:19:06,049 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10550, loss[loss=0.06707, simple_loss=0.0797, pruned_loss=0.01595, audio_tagging_loss=0.01127, over 14629.00 frames. ], tot_loss[loss=0.08594, simple_loss=0.1045, pruned_loss=0.02313, audio_tagging_loss=0.01055, over 3042011.68 frames. ], batch size: 56, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:19:27,660 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=711706.6666666666, ans=0.2 2023-11-19 11:19:29,131 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2023-11-19 11:19:33,820 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=711706.6666666666, ans=0.125 2023-11-19 11:19:39,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711773.3333333334, ans=0.1 2023-11-19 11:19:45,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=711773.3333333334, ans=0.125 2023-11-19 11:19:52,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2023-11-19 11:20:01,627 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10600, loss[loss=0.06995, simple_loss=0.08191, pruned_loss=0.0186, audio_tagging_loss=0.01039, over 15138.00 frames. ], tot_loss[loss=0.08623, simple_loss=0.1048, pruned_loss=0.02338, audio_tagging_loss=0.01047, over 3042161.76 frames. ], batch size: 58, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:20:05,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=711906.6666666666, ans=0.1 2023-11-19 11:20:06,249 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=22.5 2023-11-19 11:20:09,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=711906.6666666666, ans=0.0 2023-11-19 11:20:20,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=711973.3333333334, ans=0.2 2023-11-19 11:20:24,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712040.0, ans=0.1 2023-11-19 11:20:26,393 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=712040.0, ans=0.0 2023-11-19 11:20:32,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.185e+01 9.113e+01 1.014e+02 1.245e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 11:20:51,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=712173.3333333334, ans=0.1 2023-11-19 11:20:56,904 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10650, loss[loss=0.06766, simple_loss=0.07536, pruned_loss=0.01628, audio_tagging_loss=0.0137, over 15134.00 frames. ], tot_loss[loss=0.08568, simple_loss=0.1041, pruned_loss=0.02318, audio_tagging_loss=0.01045, over 3035010.36 frames. ], batch size: 62, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:21:03,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2023-11-19 11:21:30,808 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712440.0, ans=0.1 2023-11-19 11:21:30,916 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=712440.0, ans=0.1 2023-11-19 11:21:36,774 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=712440.0, ans=0.125 2023-11-19 11:21:38,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=712440.0, ans=0.0 2023-11-19 11:21:53,345 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10700, loss[loss=0.08598, simple_loss=0.1019, pruned_loss=0.02216, audio_tagging_loss=0.01289, over 14379.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.104, pruned_loss=0.02306, audio_tagging_loss=0.01046, over 3036109.57 frames. ], batch size: 55, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:21:53,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=712573.3333333334, ans=0.125 2023-11-19 11:22:01,699 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.25 vs. limit=10.0 2023-11-19 11:22:02,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=712573.3333333334, ans=0.0 2023-11-19 11:22:15,324 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=712706.6666666666, ans=0.0 2023-11-19 11:22:24,045 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.779e+01 8.116e+01 8.823e+01 9.508e+01 1.570e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 11:22:26,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=712773.3333333334, ans=0.1 2023-11-19 11:22:29,528 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.49 vs. limit=22.5 2023-11-19 11:22:32,836 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=712773.3333333334, ans=0.0 2023-11-19 11:22:45,009 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=712840.0, ans=0.125 2023-11-19 11:22:47,198 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=712840.0, ans=0.125 2023-11-19 11:22:49,116 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10750, loss[loss=0.0583, simple_loss=0.0628, pruned_loss=0.01491, audio_tagging_loss=0.01199, over 15327.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.1043, pruned_loss=0.02315, audio_tagging_loss=0.01042, over 3039648.09 frames. ], batch size: 61, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:22:56,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=712906.6666666666, ans=0.125 2023-11-19 11:23:04,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-11-19 11:23:09,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=712973.3333333334, ans=0.125 2023-11-19 11:23:16,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=713040.0, ans=0.0 2023-11-19 11:23:32,101 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=713106.6666666666, ans=0.2 2023-11-19 11:23:38,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=713173.3333333334, ans=0.1 2023-11-19 11:23:44,960 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10800, loss[loss=0.1063, simple_loss=0.1281, pruned_loss=0.03298, audio_tagging_loss=0.009248, over 14981.00 frames. ], tot_loss[loss=0.08561, simple_loss=0.1044, pruned_loss=0.02308, audio_tagging_loss=0.01036, over 3034337.19 frames. ], batch size: 55, lr: 7.53e-03, grad_scale: 32.0 2023-11-19 11:24:02,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=713306.6666666666, ans=12.0 2023-11-19 11:24:17,317 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.285e+01 8.937e+01 9.621e+01 1.564e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-19 11:24:34,927 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2023-11-19 11:24:40,666 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10850, loss[loss=0.06196, simple_loss=0.07325, pruned_loss=0.0102, audio_tagging_loss=0.01513, over 15063.00 frames. ], tot_loss[loss=0.08542, simple_loss=0.1042, pruned_loss=0.0229, audio_tagging_loss=0.01044, over 3034250.97 frames. ], batch size: 57, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:25:00,081 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.17 vs. limit=10.0 2023-11-19 11:25:01,112 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.86 vs. limit=15.0 2023-11-19 11:25:07,669 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=713706.6666666666, ans=0.0 2023-11-19 11:25:08,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=713706.6666666666, ans=0.0 2023-11-19 11:25:32,925 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:25:36,647 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10900, loss[loss=0.09064, simple_loss=0.1152, pruned_loss=0.02366, audio_tagging_loss=0.009364, over 15670.00 frames. ], tot_loss[loss=0.08501, simple_loss=0.1032, pruned_loss=0.02281, audio_tagging_loss=0.01062, over 3036746.57 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:25:36,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713906.6666666666, ans=0.1 2023-11-19 11:25:42,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=713906.6666666666, ans=0.2 2023-11-19 11:25:57,335 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=714040.0, ans=0.0 2023-11-19 11:26:02,067 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=714040.0, ans=0.125 2023-11-19 11:26:08,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.550e+01 9.422e+01 1.018e+02 1.383e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-19 11:26:12,144 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=714106.6666666666, ans=0.0 2023-11-19 11:26:13,168 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=714106.6666666666, ans=0.125 2023-11-19 11:26:19,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=714173.3333333334, ans=0.0 2023-11-19 11:26:24,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=714173.3333333334, ans=0.125 2023-11-19 11:26:27,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=714173.3333333334, ans=0.0 2023-11-19 11:26:31,829 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 10950, loss[loss=0.07224, simple_loss=0.09059, pruned_loss=0.01464, audio_tagging_loss=0.01231, over 14775.00 frames. ], tot_loss[loss=0.08421, simple_loss=0.102, pruned_loss=0.02245, audio_tagging_loss=0.01077, over 3032024.62 frames. ], batch size: 55, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:26:33,387 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-19 11:26:46,289 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=714306.6666666666, ans=0.125 2023-11-19 11:26:50,407 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=714306.6666666666, ans=0.125 2023-11-19 11:26:55,624 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2023-11-19 11:27:16,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=714506.6666666666, ans=0.2 2023-11-19 11:27:26,999 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11000, loss[loss=0.07766, simple_loss=0.09174, pruned_loss=0.02319, audio_tagging_loss=0.008608, over 14386.00 frames. ], tot_loss[loss=0.08515, simple_loss=0.1033, pruned_loss=0.02268, audio_tagging_loss=0.01082, over 3037488.10 frames. ], batch size: 54, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:27:27,142 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:27:27,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=714573.3333333334, ans=0.125 2023-11-19 11:27:36,085 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:27:53,799 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=714706.6666666666, ans=0.035 2023-11-19 11:27:55,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=714706.6666666666, ans=0.5 2023-11-19 11:27:59,350 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.544e+01 9.124e+01 1.001e+02 1.240e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 11:28:17,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=714840.0, ans=0.125 2023-11-19 11:28:22,460 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11050, loss[loss=0.1211, simple_loss=0.1472, pruned_loss=0.04005, audio_tagging_loss=0.007475, over 15327.00 frames. ], tot_loss[loss=0.08615, simple_loss=0.1043, pruned_loss=0.02318, audio_tagging_loss=0.01084, over 3037586.91 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:28:31,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=714906.6666666666, ans=0.125 2023-11-19 11:28:33,869 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714973.3333333334, ans=0.1 2023-11-19 11:28:35,139 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2023-11-19 11:28:38,614 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=714973.3333333334, ans=0.125 2023-11-19 11:28:46,259 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-11-19 11:28:47,566 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=715040.0, ans=0.125 2023-11-19 11:29:03,864 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=715106.6666666666, ans=0.04949747468305833 2023-11-19 11:29:08,628 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715173.3333333334, ans=0.1 2023-11-19 11:29:11,915 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-11-19 11:29:17,830 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11100, loss[loss=0.08413, simple_loss=0.1048, pruned_loss=0.02094, audio_tagging_loss=0.01076, over 15071.00 frames. ], tot_loss[loss=0.08632, simple_loss=0.1043, pruned_loss=0.02332, audio_tagging_loss=0.01087, over 3042784.77 frames. ], batch size: 57, lr: 7.52e-03, grad_scale: 16.0 2023-11-19 11:29:19,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=715240.0, ans=0.0 2023-11-19 11:29:29,623 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=715306.6666666666, ans=0.125 2023-11-19 11:29:50,611 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.681e+01 9.190e+01 1.002e+02 1.339e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 11:29:55,077 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:29:57,219 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=715440.0, ans=0.125 2023-11-19 11:30:13,757 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11150, loss[loss=0.09138, simple_loss=0.1328, pruned_loss=0.01948, audio_tagging_loss=0.005512, over 15195.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.1049, pruned_loss=0.0236, audio_tagging_loss=0.01093, over 3043486.15 frames. ], batch size: 55, lr: 7.52e-03, grad_scale: 16.0 2023-11-19 11:30:20,314 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715573.3333333334, ans=0.1 2023-11-19 11:30:21,511 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=715573.3333333334, ans=0.125 2023-11-19 11:30:27,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=715640.0, ans=0.0 2023-11-19 11:30:28,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-19 11:30:29,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=715640.0, ans=0.5 2023-11-19 11:30:30,682 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=22.5 2023-11-19 11:30:39,401 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=715706.6666666666, ans=0.1 2023-11-19 11:30:51,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=715773.3333333334, ans=0.0 2023-11-19 11:31:07,275 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=15.0 2023-11-19 11:31:08,657 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11200, loss[loss=0.07045, simple_loss=0.08761, pruned_loss=0.01537, audio_tagging_loss=0.01128, over 15258.00 frames. ], tot_loss[loss=0.08682, simple_loss=0.1047, pruned_loss=0.02346, audio_tagging_loss=0.01103, over 3038668.71 frames. ], batch size: 56, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:31:22,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=715973.3333333334, ans=0.0 2023-11-19 11:31:28,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=715973.3333333334, ans=0.125 2023-11-19 11:31:31,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=716040.0, ans=0.2 2023-11-19 11:31:41,667 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.895e+01 8.422e+01 9.292e+01 1.018e+02 1.279e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-19 11:31:46,302 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2023-11-19 11:31:52,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=716173.3333333334, ans=0.0 2023-11-19 11:31:54,638 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-11-19 11:31:55,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=716173.3333333334, ans=0.2 2023-11-19 11:32:05,050 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11250, loss[loss=0.08578, simple_loss=0.1047, pruned_loss=0.02359, audio_tagging_loss=0.009832, over 14883.00 frames. ], tot_loss[loss=0.08658, simple_loss=0.1044, pruned_loss=0.02342, audio_tagging_loss=0.01095, over 3042822.69 frames. ], batch size: 56, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:32:21,040 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=716306.6666666666, ans=0.125 2023-11-19 11:32:28,607 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2023-11-19 11:32:38,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2023-11-19 11:32:54,318 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=716506.6666666666, ans=0.09899494936611666 2023-11-19 11:33:00,952 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11300, loss[loss=0.06278, simple_loss=0.07165, pruned_loss=0.01551, audio_tagging_loss=0.01145, over 15574.00 frames. ], tot_loss[loss=0.08637, simple_loss=0.1043, pruned_loss=0.02345, audio_tagging_loss=0.01075, over 3041813.02 frames. ], batch size: 59, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:33:08,619 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=716573.3333333334, ans=0.2 2023-11-19 11:33:33,221 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.698e+01 9.365e+01 1.016e+02 1.421e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 11:33:43,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=716773.3333333334, ans=0.2 2023-11-19 11:33:44,373 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=716840.0, ans=0.0 2023-11-19 11:33:55,808 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11350, loss[loss=0.08667, simple_loss=0.1138, pruned_loss=0.02242, audio_tagging_loss=0.007357, over 16115.00 frames. ], tot_loss[loss=0.08562, simple_loss=0.1039, pruned_loss=0.02308, audio_tagging_loss=0.01058, over 3041712.48 frames. ], batch size: 59, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:33:57,127 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716906.6666666666, ans=0.1 2023-11-19 11:34:12,064 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:34:23,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=717040.0, ans=0.1 2023-11-19 11:34:32,935 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=717106.6666666666, ans=0.125 2023-11-19 11:34:33,041 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=717106.6666666666, ans=0.0 2023-11-19 11:34:34,116 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=717106.6666666666, ans=0.125 2023-11-19 11:34:43,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=717173.3333333334, ans=0.2 2023-11-19 11:34:51,226 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11400, loss[loss=0.08753, simple_loss=0.1118, pruned_loss=0.02308, audio_tagging_loss=0.008544, over 15967.00 frames. ], tot_loss[loss=0.08505, simple_loss=0.1033, pruned_loss=0.02286, audio_tagging_loss=0.01055, over 3043385.49 frames. ], batch size: 59, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:34:54,169 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=717240.0, ans=0.2 2023-11-19 11:35:15,124 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=717373.3333333334, ans=0.0 2023-11-19 11:35:22,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=717373.3333333334, ans=0.0 2023-11-19 11:35:24,346 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.257e+01 8.952e+01 9.927e+01 1.340e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 11:35:46,993 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11450, loss[loss=0.09572, simple_loss=0.1196, pruned_loss=0.02699, audio_tagging_loss=0.008955, over 15328.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.1043, pruned_loss=0.02321, audio_tagging_loss=0.01042, over 3050231.18 frames. ], batch size: 58, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:35:59,313 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=717640.0, ans=0.125 2023-11-19 11:36:31,974 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.33 vs. limit=15.0 2023-11-19 11:36:40,038 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=717840.0, ans=0.0 2023-11-19 11:36:41,943 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11500, loss[loss=0.08489, simple_loss=0.1122, pruned_loss=0.01731, audio_tagging_loss=0.01147, over 13158.00 frames. ], tot_loss[loss=0.08695, simple_loss=0.106, pruned_loss=0.02357, audio_tagging_loss=0.01036, over 3057540.14 frames. ], batch size: 53, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:37:03,228 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=718040.0, ans=0.125 2023-11-19 11:37:05,852 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=718040.0, ans=0.0 2023-11-19 11:37:15,739 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.484e+01 9.226e+01 9.872e+01 1.474e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 11:37:31,922 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2023-11-19 11:37:37,519 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11550, loss[loss=0.07264, simple_loss=0.09227, pruned_loss=0.01679, audio_tagging_loss=0.009719, over 13736.00 frames. ], tot_loss[loss=0.08706, simple_loss=0.106, pruned_loss=0.02363, audio_tagging_loss=0.01044, over 3055599.40 frames. ], batch size: 53, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:37:39,034 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-19 11:37:39,186 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-19 11:37:40,848 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=718240.0, ans=0.125 2023-11-19 11:37:40,947 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=718240.0, ans=0.2 2023-11-19 11:38:05,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.99 vs. limit=6.0 2023-11-19 11:38:07,753 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=718373.3333333334, ans=10.0 2023-11-19 11:38:11,666 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:38:13,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2023-11-19 11:38:32,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=718573.3333333334, ans=0.2 2023-11-19 11:38:33,212 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11600, loss[loss=0.104, simple_loss=0.1205, pruned_loss=0.03316, audio_tagging_loss=0.01061, over 15682.00 frames. ], tot_loss[loss=0.0872, simple_loss=0.1061, pruned_loss=0.02373, audio_tagging_loss=0.01042, over 3053410.28 frames. ], batch size: 56, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:38:36,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718573.3333333334, ans=0.1 2023-11-19 11:38:38,106 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=718573.3333333334, ans=0.125 2023-11-19 11:38:42,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.29 vs. limit=15.0 2023-11-19 11:38:45,405 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=718640.0, ans=0.125 2023-11-19 11:38:48,008 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-19 11:39:00,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718706.6666666666, ans=0.1 2023-11-19 11:39:06,020 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.247e+01 8.169e+01 9.328e+01 1.011e+02 1.273e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 11:39:07,265 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:39:07,299 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=718773.3333333334, ans=0.04949747468305833 2023-11-19 11:39:11,138 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=718773.3333333334, ans=0.2 2023-11-19 11:39:19,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=718840.0, ans=0.125 2023-11-19 11:39:23,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=718840.0, ans=0.0 2023-11-19 11:39:28,769 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11650, loss[loss=0.05584, simple_loss=0.06323, pruned_loss=0.01296, audio_tagging_loss=0.01126, over 15000.00 frames. ], tot_loss[loss=0.08686, simple_loss=0.1055, pruned_loss=0.02364, audio_tagging_loss=0.01045, over 3041269.90 frames. ], batch size: 56, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:40:07,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=719106.6666666666, ans=0.125 2023-11-19 11:40:10,496 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.52 vs. limit=10.0 2023-11-19 11:40:13,817 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2023-11-19 11:40:19,625 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2023-11-19 11:40:24,340 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11700, loss[loss=0.07332, simple_loss=0.08423, pruned_loss=0.01991, audio_tagging_loss=0.0113, over 15618.00 frames. ], tot_loss[loss=0.0864, simple_loss=0.1049, pruned_loss=0.02347, audio_tagging_loss=0.01048, over 3040639.54 frames. ], batch size: 60, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:40:29,812 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=719240.0, ans=0.125 2023-11-19 11:40:56,697 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=719440.0, ans=0.125 2023-11-19 11:40:57,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.259e+01 8.836e+01 9.705e+01 1.355e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-19 11:41:19,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=719573.3333333334, ans=0.125 2023-11-19 11:41:19,883 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11750, loss[loss=0.1109, simple_loss=0.1388, pruned_loss=0.03386, audio_tagging_loss=0.007666, over 14774.00 frames. ], tot_loss[loss=0.08677, simple_loss=0.1053, pruned_loss=0.02359, audio_tagging_loss=0.01056, over 3035976.92 frames. ], batch size: 53, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:41:27,046 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=719573.3333333334, ans=0.125 2023-11-19 11:41:38,702 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=719640.0, ans=0.125 2023-11-19 11:41:43,218 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-19 11:41:45,979 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.44 vs. limit=10.0 2023-11-19 11:41:50,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=719706.6666666666, ans=0.0 2023-11-19 11:41:53,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=719773.3333333334, ans=0.2 2023-11-19 11:42:07,003 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=719840.0, ans=0.0 2023-11-19 11:42:14,819 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11800, loss[loss=0.1157, simple_loss=0.1398, pruned_loss=0.03623, audio_tagging_loss=0.009527, over 15791.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1043, pruned_loss=0.02352, audio_tagging_loss=0.01073, over 3036058.42 frames. ], batch size: 56, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:42:25,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2023-11-19 11:42:49,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.669e+01 9.441e+01 1.062e+02 1.406e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 11:43:07,776 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.40 vs. limit=10.0 2023-11-19 11:43:11,844 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11850, loss[loss=0.09178, simple_loss=0.1199, pruned_loss=0.02297, audio_tagging_loss=0.008849, over 15502.00 frames. ], tot_loss[loss=0.08708, simple_loss=0.1055, pruned_loss=0.0237, audio_tagging_loss=0.01062, over 3036700.62 frames. ], batch size: 57, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:43:18,382 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=720240.0, ans=0.125 2023-11-19 11:43:40,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=720373.3333333334, ans=0.0 2023-11-19 11:43:58,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=720506.6666666666, ans=0.125 2023-11-19 11:44:07,212 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11900, loss[loss=0.0964, simple_loss=0.124, pruned_loss=0.02194, audio_tagging_loss=0.01248, over 15053.00 frames. ], tot_loss[loss=0.08637, simple_loss=0.1046, pruned_loss=0.02332, audio_tagging_loss=0.01076, over 3039564.69 frames. ], batch size: 57, lr: 7.50e-03, grad_scale: 16.0 2023-11-19 11:44:11,054 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.98 vs. limit=22.5 2023-11-19 11:44:27,359 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=720640.0, ans=0.0 2023-11-19 11:44:41,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.859e+01 8.394e+01 8.886e+01 9.944e+01 1.266e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 11:44:41,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720773.3333333334, ans=0.1 2023-11-19 11:44:41,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720773.3333333334, ans=0.1 2023-11-19 11:44:55,137 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:45:02,654 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=720906.6666666666, ans=0.0 2023-11-19 11:45:03,434 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 11950, loss[loss=0.08701, simple_loss=0.1066, pruned_loss=0.02612, audio_tagging_loss=0.007565, over 15308.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1045, pruned_loss=0.0233, audio_tagging_loss=0.01082, over 3040433.03 frames. ], batch size: 57, lr: 7.49e-03, grad_scale: 16.0 2023-11-19 11:45:03,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=720906.6666666666, ans=0.125 2023-11-19 11:45:16,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=720973.3333333334, ans=0.07 2023-11-19 11:45:48,604 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2023-11-19 11:45:56,802 INFO [train_asr.py:1115] (3/4) Epoch 9, batch 12000, loss[loss=0.07966, simple_loss=0.09465, pruned_loss=0.01947, audio_tagging_loss=0.01286, over 15485.00 frames. ], tot_loss[loss=0.08693, simple_loss=0.1052, pruned_loss=0.02347, audio_tagging_loss=0.01086, over 3038291.61 frames. ], batch size: 58, lr: 7.49e-03, grad_scale: 32.0 2023-11-19 11:45:56,802 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 11:46:16,599 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3156, 4.9937, 4.8481, 5.1793], device='cuda:3') 2023-11-19 11:46:29,171 INFO [train_asr.py:1147] (3/4) Epoch 9, validation: loss=0.06606, simple_loss=0.05578, pruned_loss=0.006612, audio_tagging_loss=0.03155, over 4681554.00 frames. 2023-11-19 11:46:29,172 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 11:46:41,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-19 11:46:43,783 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=721306.6666666666, ans=0.2 2023-11-19 11:46:51,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=721373.3333333334, ans=0.0 2023-11-19 11:47:32,065 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 0, loss[loss=0.1005, simple_loss=0.1233, pruned_loss=0.01874, audio_tagging_loss=0.02007, over 15003.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1233, pruned_loss=0.01874, audio_tagging_loss=0.02007, over 15003.00 frames. ], batch size: 55, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:47:32,066 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 11:47:53,560 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3265, 4.9633, 4.8189, 5.2181], device='cuda:3') 2023-11-19 11:48:03,895 INFO [train_asr.py:1147] (3/4) Epoch 10, validation: loss=0.06458, simple_loss=0.05578, pruned_loss=0.006606, audio_tagging_loss=0.03009, over 4681554.00 frames. 2023-11-19 11:48:03,896 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 11:48:08,311 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=721400.0, ans=0.1 2023-11-19 11:48:11,157 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.461e+01 9.125e+01 9.697e+01 1.516e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 11:48:17,394 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=721466.6666666666, ans=0.09899494936611666 2023-11-19 11:48:18,386 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=721466.6666666666, ans=0.125 2023-11-19 11:48:18,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.62 vs. limit=10.0 2023-11-19 11:48:36,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=721600.0, ans=0.125 2023-11-19 11:48:55,847 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.76 vs. limit=5.0 2023-11-19 11:48:59,736 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 50, loss[loss=0.1027, simple_loss=0.1225, pruned_loss=0.023, audio_tagging_loss=0.01841, over 14609.00 frames. ], tot_loss[loss=0.09585, simple_loss=0.1058, pruned_loss=0.02317, audio_tagging_loss=0.01979, over 687631.10 frames. ], batch size: 54, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:49:04,605 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721733.3333333334, ans=0.1 2023-11-19 11:49:11,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=721800.0, ans=0.2 2023-11-19 11:49:26,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=721866.6666666666, ans=0.1 2023-11-19 11:49:48,211 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=722000.0, ans=0.0 2023-11-19 11:49:54,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=722066.6666666666, ans=0.125 2023-11-19 11:49:55,460 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 100, loss[loss=0.09156, simple_loss=0.1019, pruned_loss=0.02263, audio_tagging_loss=0.01798, over 16282.00 frames. ], tot_loss[loss=0.09545, simple_loss=0.1062, pruned_loss=0.02316, audio_tagging_loss=0.01917, over 1210313.42 frames. ], batch size: 61, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:50:03,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.830e+01 9.521e+01 1.052e+02 1.360e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 11:50:23,591 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=722200.0, ans=10.0 2023-11-19 11:50:27,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=722200.0, ans=0.2 2023-11-19 11:50:51,919 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 150, loss[loss=0.05714, simple_loss=0.06257, pruned_loss=0.01109, audio_tagging_loss=0.01477, over 14894.00 frames. ], tot_loss[loss=0.09325, simple_loss=0.1065, pruned_loss=0.02295, audio_tagging_loss=0.01702, over 1614885.88 frames. ], batch size: 57, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:50:52,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=722400.0, ans=0.125 2023-11-19 11:51:37,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=722666.6666666666, ans=0.2 2023-11-19 11:51:37,615 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=722666.6666666666, ans=0.125 2023-11-19 11:51:46,170 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=722666.6666666666, ans=0.0 2023-11-19 11:51:47,213 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=722733.3333333334, ans=0.125 2023-11-19 11:51:47,935 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 200, loss[loss=0.07513, simple_loss=0.09697, pruned_loss=0.01779, audio_tagging_loss=0.00885, over 15133.00 frames. ], tot_loss[loss=0.0914, simple_loss=0.1066, pruned_loss=0.02314, audio_tagging_loss=0.01495, over 1936731.76 frames. ], batch size: 58, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:51:49,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722733.3333333334, ans=0.1 2023-11-19 11:51:56,596 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.422e+01 8.351e+01 9.298e+01 1.028e+02 1.327e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 11:52:02,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722800.0, ans=0.1 2023-11-19 11:52:03,315 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=722800.0, ans=0.0 2023-11-19 11:52:07,517 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=722800.0, ans=0.0 2023-11-19 11:52:26,865 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=722933.3333333334, ans=0.125 2023-11-19 11:52:31,198 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2023-11-19 11:52:34,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=723000.0, ans=0.07 2023-11-19 11:52:35,867 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=723000.0, ans=0.125 2023-11-19 11:52:42,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723000.0, ans=0.1 2023-11-19 11:52:43,786 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=723066.6666666666, ans=0.125 2023-11-19 11:52:44,661 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 250, loss[loss=0.082, simple_loss=0.1049, pruned_loss=0.02114, audio_tagging_loss=0.008396, over 14289.00 frames. ], tot_loss[loss=0.09109, simple_loss=0.1079, pruned_loss=0.02353, audio_tagging_loss=0.01359, over 2185732.80 frames. ], batch size: 52, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:53:01,754 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=723133.3333333334, ans=0.0 2023-11-19 11:53:07,612 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.66 vs. limit=15.0 2023-11-19 11:53:20,327 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=723266.6666666666, ans=0.125 2023-11-19 11:53:40,159 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 300, loss[loss=0.08231, simple_loss=0.1034, pruned_loss=0.01918, audio_tagging_loss=0.01141, over 15115.00 frames. ], tot_loss[loss=0.09076, simple_loss=0.1085, pruned_loss=0.02392, audio_tagging_loss=0.01257, over 2381579.43 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:53:48,085 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.399e+01 9.137e+01 9.924e+01 1.644e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 11:53:48,256 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=723400.0, ans=0.125 2023-11-19 11:54:05,403 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=723533.3333333334, ans=0.0 2023-11-19 11:54:12,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=723600.0, ans=0.0 2023-11-19 11:54:27,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=723666.6666666666, ans=0.0 2023-11-19 11:54:31,448 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=12.0 2023-11-19 11:54:35,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=723733.3333333334, ans=0.125 2023-11-19 11:54:36,132 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 350, loss[loss=0.08952, simple_loss=0.1031, pruned_loss=0.0255, audio_tagging_loss=0.01249, over 15184.00 frames. ], tot_loss[loss=0.08979, simple_loss=0.1078, pruned_loss=0.02388, audio_tagging_loss=0.01203, over 2528123.91 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:54:38,947 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2023-11-19 11:54:50,266 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=723800.0, ans=0.0 2023-11-19 11:54:57,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=723866.6666666666, ans=0.125 2023-11-19 11:55:14,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=723933.3333333334, ans=0.125 2023-11-19 11:55:29,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=724000.0, ans=0.0 2023-11-19 11:55:32,496 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 400, loss[loss=0.09705, simple_loss=0.1242, pruned_loss=0.02273, audio_tagging_loss=0.01221, over 15557.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1069, pruned_loss=0.02351, audio_tagging_loss=0.01167, over 2644147.36 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:55:35,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=724066.6666666666, ans=0.0 2023-11-19 11:55:39,903 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.520e+01 9.395e+01 1.002e+02 1.359e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 11:55:41,237 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=724066.6666666666, ans=0.125 2023-11-19 11:55:56,242 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2023-11-19 11:56:02,361 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=724200.0, ans=0.125 2023-11-19 11:56:28,225 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 450, loss[loss=0.09606, simple_loss=0.118, pruned_loss=0.02463, audio_tagging_loss=0.01241, over 16404.00 frames. ], tot_loss[loss=0.08831, simple_loss=0.107, pruned_loss=0.02348, audio_tagging_loss=0.01132, over 2732608.95 frames. ], batch size: 60, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:56:30,466 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:56:52,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=724533.3333333334, ans=0.2 2023-11-19 11:57:01,963 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=724600.0, ans=0.2 2023-11-19 11:57:12,520 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-19 11:57:13,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=724666.6666666666, ans=0.0 2023-11-19 11:57:22,309 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-19 11:57:23,927 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 500, loss[loss=0.07295, simple_loss=0.0865, pruned_loss=0.02073, audio_tagging_loss=0.008977, over 15656.00 frames. ], tot_loss[loss=0.08788, simple_loss=0.1063, pruned_loss=0.02357, audio_tagging_loss=0.01117, over 2802703.93 frames. ], batch size: 60, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:57:26,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=724733.3333333334, ans=0.0 2023-11-19 11:57:27,224 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=724733.3333333334, ans=0.0 2023-11-19 11:57:31,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.489e+01 8.496e+01 9.010e+01 1.030e+02 1.418e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 11:57:34,646 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=12.0 2023-11-19 11:57:52,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=724866.6666666666, ans=0.125 2023-11-19 11:57:58,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=724933.3333333334, ans=0.0 2023-11-19 11:58:09,766 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=725000.0, ans=0.125 2023-11-19 11:58:09,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=725000.0, ans=0.2 2023-11-19 11:58:18,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=725066.6666666666, ans=0.125 2023-11-19 11:58:19,508 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 550, loss[loss=0.04183, simple_loss=0.04445, pruned_loss=0.008398, audio_tagging_loss=0.01121, over 15013.00 frames. ], tot_loss[loss=0.08693, simple_loss=0.105, pruned_loss=0.02327, audio_tagging_loss=0.01116, over 2852549.05 frames. ], batch size: 58, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:58:21,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=725066.6666666666, ans=0.125 2023-11-19 11:58:27,882 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=725066.6666666666, ans=0.125 2023-11-19 11:58:32,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=725133.3333333334, ans=0.125 2023-11-19 11:58:39,058 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=725133.3333333334, ans=0.0 2023-11-19 11:59:11,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=725333.3333333334, ans=0.2 2023-11-19 11:59:14,624 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=725400.0, ans=0.125 2023-11-19 11:59:15,992 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 600, loss[loss=0.07727, simple_loss=0.0856, pruned_loss=0.02234, audio_tagging_loss=0.01213, over 15430.00 frames. ], tot_loss[loss=0.087, simple_loss=0.1055, pruned_loss=0.02331, audio_tagging_loss=0.01095, over 2885773.37 frames. ], batch size: 60, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 11:59:23,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.301e+01 8.690e+01 9.584e+01 1.385e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-19 11:59:39,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=725533.3333333334, ans=0.2 2023-11-19 11:59:50,358 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-19 11:59:54,482 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=725600.0, ans=0.125 2023-11-19 12:00:11,789 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 650, loss[loss=0.1164, simple_loss=0.1484, pruned_loss=0.03266, audio_tagging_loss=0.009534, over 15384.00 frames. ], tot_loss[loss=0.08684, simple_loss=0.1054, pruned_loss=0.02324, audio_tagging_loss=0.01087, over 2922455.37 frames. ], batch size: 54, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:00:16,568 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2023-11-19 12:00:17,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=725733.3333333334, ans=0.035 2023-11-19 12:00:58,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=726000.0, ans=0.125 2023-11-19 12:01:06,896 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 700, loss[loss=0.06013, simple_loss=0.07042, pruned_loss=0.01351, audio_tagging_loss=0.01141, over 14968.00 frames. ], tot_loss[loss=0.08666, simple_loss=0.1054, pruned_loss=0.02326, audio_tagging_loss=0.01069, over 2947604.58 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:01:08,421 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.50 vs. limit=15.0 2023-11-19 12:01:14,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.278e+01 8.967e+01 9.847e+01 1.279e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 12:01:30,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=726200.0, ans=0.125 2023-11-19 12:01:45,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=726266.6666666666, ans=0.2 2023-11-19 12:02:02,256 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 750, loss[loss=0.0944, simple_loss=0.1137, pruned_loss=0.02558, audio_tagging_loss=0.01198, over 15348.00 frames. ], tot_loss[loss=0.0874, simple_loss=0.1065, pruned_loss=0.02348, audio_tagging_loss=0.01067, over 2979187.36 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:02:19,117 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=726466.6666666666, ans=0.125 2023-11-19 12:02:45,154 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=726600.0, ans=0.0 2023-11-19 12:02:59,397 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 800, loss[loss=0.08286, simple_loss=0.1054, pruned_loss=0.02182, audio_tagging_loss=0.008314, over 15562.00 frames. ], tot_loss[loss=0.08707, simple_loss=0.1065, pruned_loss=0.02321, audio_tagging_loss=0.01063, over 2996381.26 frames. ], batch size: 56, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:03:06,736 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.428e+01 9.274e+01 1.007e+02 1.434e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 12:03:12,678 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=726800.0, ans=0.0 2023-11-19 12:03:22,137 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=726866.6666666666, ans=0.0 2023-11-19 12:03:24,107 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=726866.6666666666, ans=0.0 2023-11-19 12:03:33,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=726933.3333333334, ans=0.2 2023-11-19 12:03:33,242 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=726933.3333333334, ans=0.2 2023-11-19 12:03:54,768 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 850, loss[loss=0.07064, simple_loss=0.08169, pruned_loss=0.01783, audio_tagging_loss=0.01197, over 17012.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1054, pruned_loss=0.02319, audio_tagging_loss=0.01073, over 3003848.54 frames. ], batch size: 64, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:04:12,383 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=727133.3333333334, ans=0.1 2023-11-19 12:04:16,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=727200.0, ans=0.0 2023-11-19 12:04:23,285 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=727200.0, ans=0.2 2023-11-19 12:04:28,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=727266.6666666666, ans=0.125 2023-11-19 12:04:31,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=727266.6666666666, ans=0.125 2023-11-19 12:04:50,020 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 900, loss[loss=0.06693, simple_loss=0.08279, pruned_loss=0.01715, audio_tagging_loss=0.008387, over 14456.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.1056, pruned_loss=0.02337, audio_tagging_loss=0.0108, over 3014996.09 frames. ], batch size: 56, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:04:57,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.263e+01 8.793e+01 9.779e+01 1.235e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-19 12:05:06,570 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=727466.6666666666, ans=0.1 2023-11-19 12:05:07,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727466.6666666666, ans=0.1 2023-11-19 12:05:40,352 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=727666.6666666666, ans=0.07 2023-11-19 12:05:46,700 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 950, loss[loss=0.09343, simple_loss=0.1255, pruned_loss=0.02323, audio_tagging_loss=0.007454, over 14711.00 frames. ], tot_loss[loss=0.08748, simple_loss=0.1066, pruned_loss=0.02364, audio_tagging_loss=0.01053, over 3031360.97 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:05:55,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=727733.3333333334, ans=0.125 2023-11-19 12:05:58,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2023-11-19 12:06:31,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=728000.0, ans=0.125 2023-11-19 12:06:40,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=728000.0, ans=0.1 2023-11-19 12:06:42,008 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1000, loss[loss=0.08964, simple_loss=0.1032, pruned_loss=0.02698, audio_tagging_loss=0.01104, over 16431.00 frames. ], tot_loss[loss=0.08735, simple_loss=0.1069, pruned_loss=0.02355, audio_tagging_loss=0.01036, over 3038426.73 frames. ], batch size: 65, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:06:47,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=728066.6666666666, ans=0.0 2023-11-19 12:06:49,834 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.784e+01 8.258e+01 8.941e+01 9.779e+01 1.255e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 12:07:05,202 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:07:06,440 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=728200.0, ans=0.0 2023-11-19 12:07:11,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=22.5 2023-11-19 12:07:19,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=728266.6666666666, ans=0.125 2023-11-19 12:07:28,472 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=728333.3333333334, ans=0.125 2023-11-19 12:07:33,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=728333.3333333334, ans=0.125 2023-11-19 12:07:37,668 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1050, loss[loss=0.08782, simple_loss=0.1028, pruned_loss=0.027, audio_tagging_loss=0.009397, over 15732.00 frames. ], tot_loss[loss=0.08705, simple_loss=0.1063, pruned_loss=0.0235, audio_tagging_loss=0.01042, over 3042326.62 frames. ], batch size: 60, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:07:48,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.39 vs. limit=22.5 2023-11-19 12:08:27,813 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=15.0 2023-11-19 12:08:34,194 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1100, loss[loss=0.07019, simple_loss=0.08518, pruned_loss=0.01614, audio_tagging_loss=0.01146, over 14534.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.1062, pruned_loss=0.02356, audio_tagging_loss=0.01029, over 3039214.84 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:08:34,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=728733.3333333334, ans=0.1 2023-11-19 12:08:35,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=22.5 2023-11-19 12:08:36,367 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:08:42,199 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.136e+01 8.991e+01 9.834e+01 1.618e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 12:08:57,188 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2023-11-19 12:09:03,283 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=728866.6666666666, ans=0.0 2023-11-19 12:09:09,161 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=728933.3333333334, ans=0.125 2023-11-19 12:09:27,564 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=729000.0, ans=0.125 2023-11-19 12:09:30,429 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1150, loss[loss=0.13, simple_loss=0.1637, pruned_loss=0.04072, audio_tagging_loss=0.00738, over 15670.00 frames. ], tot_loss[loss=0.08709, simple_loss=0.1062, pruned_loss=0.02359, audio_tagging_loss=0.01039, over 3036492.77 frames. ], batch size: 57, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:09:31,035 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=22.5 2023-11-19 12:09:34,272 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:09:45,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=729133.3333333334, ans=0.0 2023-11-19 12:09:48,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=729133.3333333334, ans=0.125 2023-11-19 12:09:57,314 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2023-11-19 12:10:26,220 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1200, loss[loss=0.08602, simple_loss=0.1057, pruned_loss=0.02074, audio_tagging_loss=0.01244, over 14844.00 frames. ], tot_loss[loss=0.08606, simple_loss=0.1046, pruned_loss=0.02321, audio_tagging_loss=0.01053, over 3038298.33 frames. ], batch size: 54, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 12:10:31,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=729400.0, ans=0.1 2023-11-19 12:10:35,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.399e+01 9.041e+01 1.012e+02 1.294e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 12:10:53,262 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:10:54,565 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=729533.3333333334, ans=6.0 2023-11-19 12:11:09,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=729666.6666666666, ans=0.125 2023-11-19 12:11:15,015 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=729666.6666666666, ans=0.1 2023-11-19 12:11:21,630 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1250, loss[loss=0.07273, simple_loss=0.07954, pruned_loss=0.02187, audio_tagging_loss=0.01109, over 14295.00 frames. ], tot_loss[loss=0.08605, simple_loss=0.1046, pruned_loss=0.02327, audio_tagging_loss=0.0105, over 3032414.69 frames. ], batch size: 56, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 12:11:29,728 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=729733.3333333334, ans=0.125 2023-11-19 12:11:44,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=729866.6666666666, ans=0.05 2023-11-19 12:11:51,056 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=729866.6666666666, ans=0.0 2023-11-19 12:11:58,825 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=729933.3333333334, ans=0.0 2023-11-19 12:11:59,066 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=12.0 2023-11-19 12:12:17,089 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1300, loss[loss=0.08467, simple_loss=0.1109, pruned_loss=0.01741, audio_tagging_loss=0.01182, over 14538.00 frames. ], tot_loss[loss=0.08562, simple_loss=0.1042, pruned_loss=0.02308, audio_tagging_loss=0.01043, over 3033188.57 frames. ], batch size: 52, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:12:24,012 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.05 vs. limit=22.5 2023-11-19 12:12:26,888 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=730066.6666666666, ans=10.0 2023-11-19 12:12:27,300 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2023-11-19 12:12:27,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.101e+01 8.789e+01 9.869e+01 1.258e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-19 12:12:38,509 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=730200.0, ans=0.125 2023-11-19 12:12:44,371 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.25 vs. limit=15.0 2023-11-19 12:12:45,305 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=730200.0, ans=0.125 2023-11-19 12:12:46,246 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:12:52,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=730266.6666666666, ans=0.1 2023-11-19 12:12:56,533 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=730266.6666666666, ans=0.125 2023-11-19 12:13:12,370 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=730400.0, ans=0.125 2023-11-19 12:13:13,112 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1350, loss[loss=0.0831, simple_loss=0.09901, pruned_loss=0.02156, audio_tagging_loss=0.01204, over 16044.00 frames. ], tot_loss[loss=0.08489, simple_loss=0.1032, pruned_loss=0.02279, audio_tagging_loss=0.01051, over 3034235.19 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:13:31,522 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.01 vs. limit=15.0 2023-11-19 12:13:45,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=730600.0, ans=0.2 2023-11-19 12:13:52,821 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:14:08,549 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1400, loss[loss=0.1232, simple_loss=0.1519, pruned_loss=0.03842, audio_tagging_loss=0.008828, over 16701.00 frames. ], tot_loss[loss=0.08574, simple_loss=0.1047, pruned_loss=0.02296, audio_tagging_loss=0.01043, over 3042826.27 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:14:13,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=730733.3333333334, ans=0.2 2023-11-19 12:14:18,522 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.609e+01 8.095e+01 8.801e+01 9.622e+01 1.373e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-19 12:14:29,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=730866.6666666666, ans=0.0 2023-11-19 12:14:37,163 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.31 vs. limit=10.0 2023-11-19 12:14:42,148 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=730933.3333333334, ans=0.1 2023-11-19 12:15:04,106 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1450, loss[loss=0.0955, simple_loss=0.125, pruned_loss=0.02581, audio_tagging_loss=0.007165, over 16669.00 frames. ], tot_loss[loss=0.08637, simple_loss=0.1053, pruned_loss=0.0232, audio_tagging_loss=0.01053, over 3045428.04 frames. ], batch size: 61, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:15:16,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=731133.3333333334, ans=0.125 2023-11-19 12:15:23,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=731133.3333333334, ans=0.0 2023-11-19 12:15:42,390 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=731266.6666666666, ans=0.0 2023-11-19 12:15:47,630 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=731333.3333333334, ans=0.0 2023-11-19 12:15:55,504 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=731333.3333333334, ans=0.0 2023-11-19 12:15:56,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=731333.3333333334, ans=0.125 2023-11-19 12:16:00,086 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1500, loss[loss=0.09792, simple_loss=0.123, pruned_loss=0.02816, audio_tagging_loss=0.008273, over 15400.00 frames. ], tot_loss[loss=0.08754, simple_loss=0.1066, pruned_loss=0.02373, audio_tagging_loss=0.01052, over 3047380.03 frames. ], batch size: 58, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:16:08,053 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.44 vs. limit=15.0 2023-11-19 12:16:09,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.686e+01 9.376e+01 1.030e+02 1.552e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-19 12:16:14,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=731466.6666666666, ans=0.0 2023-11-19 12:16:16,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=731466.6666666666, ans=0.125 2023-11-19 12:16:19,552 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=731466.6666666666, ans=0.1 2023-11-19 12:16:23,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=731533.3333333334, ans=0.04949747468305833 2023-11-19 12:16:24,757 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=731533.3333333334, ans=0.0 2023-11-19 12:16:30,024 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:16:32,291 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=731600.0, ans=0.125 2023-11-19 12:16:33,724 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=731600.0, ans=15.0 2023-11-19 12:16:38,147 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=731600.0, ans=0.5 2023-11-19 12:16:52,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=731666.6666666666, ans=0.125 2023-11-19 12:16:55,766 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1550, loss[loss=0.08335, simple_loss=0.1005, pruned_loss=0.01998, audio_tagging_loss=0.01313, over 14378.00 frames. ], tot_loss[loss=0.08777, simple_loss=0.1066, pruned_loss=0.02388, audio_tagging_loss=0.01058, over 3039571.96 frames. ], batch size: 55, lr: 7.07e-03, grad_scale: 16.0 2023-11-19 12:17:04,928 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731733.3333333334, ans=0.1 2023-11-19 12:17:20,484 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=731866.6666666666, ans=0.125 2023-11-19 12:17:27,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=731866.6666666666, ans=0.125 2023-11-19 12:17:46,714 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=732000.0, ans=0.125 2023-11-19 12:17:51,797 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1600, loss[loss=0.08696, simple_loss=0.1007, pruned_loss=0.02505, audio_tagging_loss=0.01156, over 15589.00 frames. ], tot_loss[loss=0.08806, simple_loss=0.1069, pruned_loss=0.02402, audio_tagging_loss=0.0106, over 3044878.06 frames. ], batch size: 59, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:18:01,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.642e+01 8.544e+01 9.122e+01 1.002e+02 1.471e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 12:18:04,225 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=732133.3333333334, ans=0.125 2023-11-19 12:18:08,704 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.06 vs. limit=22.5 2023-11-19 12:18:15,521 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2023-11-19 12:18:18,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=732200.0, ans=0.125 2023-11-19 12:18:26,776 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=732266.6666666666, ans=0.125 2023-11-19 12:18:26,780 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=732266.6666666666, ans=0.04949747468305833 2023-11-19 12:18:37,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=732333.3333333334, ans=0.04949747468305833 2023-11-19 12:18:43,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=22.5 2023-11-19 12:18:47,116 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1650, loss[loss=0.08189, simple_loss=0.1002, pruned_loss=0.02041, audio_tagging_loss=0.01139, over 14922.00 frames. ], tot_loss[loss=0.08705, simple_loss=0.1056, pruned_loss=0.0236, audio_tagging_loss=0.01066, over 3038145.84 frames. ], batch size: 57, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:19:00,785 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.42 vs. limit=12.0 2023-11-19 12:19:12,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732533.3333333334, ans=0.1 2023-11-19 12:19:29,607 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=732600.0, ans=0.0 2023-11-19 12:19:39,610 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=732666.6666666666, ans=0.125 2023-11-19 12:19:40,764 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=732666.6666666666, ans=0.0 2023-11-19 12:19:42,566 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1700, loss[loss=0.05483, simple_loss=0.0697, pruned_loss=0.01144, audio_tagging_loss=0.008539, over 14602.00 frames. ], tot_loss[loss=0.08671, simple_loss=0.1051, pruned_loss=0.02338, audio_tagging_loss=0.01076, over 3041299.30 frames. ], batch size: 56, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:19:53,003 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.193e+01 8.787e+01 9.627e+01 1.247e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-19 12:19:57,589 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:20:03,853 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=732866.6666666666, ans=0.0 2023-11-19 12:20:11,894 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732866.6666666666, ans=0.125 2023-11-19 12:20:13,002 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=732866.6666666666, ans=0.1 2023-11-19 12:20:20,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=732933.3333333334, ans=0.125 2023-11-19 12:20:27,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=733000.0, ans=0.125 2023-11-19 12:20:38,897 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1750, loss[loss=0.1017, simple_loss=0.125, pruned_loss=0.02921, audio_tagging_loss=0.00992, over 16362.00 frames. ], tot_loss[loss=0.08733, simple_loss=0.1064, pruned_loss=0.02357, audio_tagging_loss=0.01054, over 3036136.74 frames. ], batch size: 60, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:20:47,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=733066.6666666666, ans=0.035 2023-11-19 12:20:59,508 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=733200.0, ans=0.0 2023-11-19 12:21:24,877 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733333.3333333334, ans=0.1 2023-11-19 12:21:34,740 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1800, loss[loss=0.108, simple_loss=0.1304, pruned_loss=0.03401, audio_tagging_loss=0.008788, over 14458.00 frames. ], tot_loss[loss=0.08654, simple_loss=0.1056, pruned_loss=0.02327, audio_tagging_loss=0.01049, over 3036738.09 frames. ], batch size: 54, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:21:44,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.335e+01 9.001e+01 1.003e+02 1.279e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 12:22:23,575 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=733666.6666666666, ans=0.0 2023-11-19 12:22:29,663 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1850, loss[loss=0.09373, simple_loss=0.1139, pruned_loss=0.02523, audio_tagging_loss=0.01154, over 15147.00 frames. ], tot_loss[loss=0.0873, simple_loss=0.1066, pruned_loss=0.02363, audio_tagging_loss=0.01035, over 3043071.74 frames. ], batch size: 54, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:22:32,794 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2023-11-19 12:22:34,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=733733.3333333334, ans=0.1 2023-11-19 12:22:52,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=733866.6666666666, ans=0.125 2023-11-19 12:23:01,741 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=733866.6666666666, ans=0.2 2023-11-19 12:23:24,345 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=734000.0, ans=0.0 2023-11-19 12:23:25,398 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=734066.6666666666, ans=0.125 2023-11-19 12:23:26,246 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1900, loss[loss=0.08267, simple_loss=0.09927, pruned_loss=0.02331, audio_tagging_loss=0.009724, over 16268.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1056, pruned_loss=0.02321, audio_tagging_loss=0.01036, over 3047488.89 frames. ], batch size: 60, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:23:36,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.533e+01 9.158e+01 9.922e+01 1.269e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 12:23:48,451 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=734200.0, ans=15.0 2023-11-19 12:23:50,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=734200.0, ans=0.95 2023-11-19 12:23:56,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=734200.0, ans=0.125 2023-11-19 12:24:05,492 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=734266.6666666666, ans=0.125 2023-11-19 12:24:07,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.02 vs. limit=22.5 2023-11-19 12:24:21,912 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 1950, loss[loss=0.06034, simple_loss=0.06812, pruned_loss=0.01536, audio_tagging_loss=0.01092, over 14701.00 frames. ], tot_loss[loss=0.08562, simple_loss=0.1048, pruned_loss=0.02292, audio_tagging_loss=0.01032, over 3045319.41 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:24:31,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=734466.6666666666, ans=0.0 2023-11-19 12:24:34,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2023-11-19 12:24:36,333 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2023-11-19 12:24:39,970 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=22.5 2023-11-19 12:24:48,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=734533.3333333334, ans=0.0 2023-11-19 12:25:03,637 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=734600.0, ans=0.2 2023-11-19 12:25:05,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=734666.6666666666, ans=0.2 2023-11-19 12:25:13,475 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=734666.6666666666, ans=0.0 2023-11-19 12:25:16,869 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.49 vs. limit=22.5 2023-11-19 12:25:17,479 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2000, loss[loss=0.08379, simple_loss=0.1018, pruned_loss=0.02239, audio_tagging_loss=0.01051, over 14966.00 frames. ], tot_loss[loss=0.0854, simple_loss=0.1043, pruned_loss=0.02289, audio_tagging_loss=0.01038, over 3039808.65 frames. ], batch size: 55, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:25:17,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=734733.3333333334, ans=0.2 2023-11-19 12:25:23,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=734733.3333333334, ans=0.125 2023-11-19 12:25:24,634 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=734733.3333333334, ans=0.125 2023-11-19 12:25:28,539 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.084e+01 8.826e+01 9.531e+01 1.443e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 12:25:35,356 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.96 vs. limit=15.0 2023-11-19 12:25:50,479 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=734933.3333333334, ans=0.125 2023-11-19 12:25:54,662 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=734933.3333333334, ans=0.0 2023-11-19 12:26:02,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=735000.0, ans=0.2 2023-11-19 12:26:13,999 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2050, loss[loss=0.08622, simple_loss=0.1022, pruned_loss=0.02273, audio_tagging_loss=0.01238, over 14671.00 frames. ], tot_loss[loss=0.08513, simple_loss=0.1039, pruned_loss=0.02274, audio_tagging_loss=0.01042, over 3039408.52 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:26:22,048 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2023-11-19 12:26:27,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=22.5 2023-11-19 12:26:37,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=10.0 2023-11-19 12:26:47,960 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=735266.6666666666, ans=0.0 2023-11-19 12:26:50,178 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=735266.6666666666, ans=0.5 2023-11-19 12:27:09,241 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2100, loss[loss=0.09482, simple_loss=0.1114, pruned_loss=0.02795, audio_tagging_loss=0.01119, over 16144.00 frames. ], tot_loss[loss=0.08527, simple_loss=0.1041, pruned_loss=0.02279, audio_tagging_loss=0.01042, over 3039605.43 frames. ], batch size: 59, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:27:18,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.692e+01 8.614e+01 9.149e+01 1.029e+02 1.234e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 12:27:26,434 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=735466.6666666666, ans=0.125 2023-11-19 12:27:51,715 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=735600.0, ans=0.125 2023-11-19 12:27:54,920 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=735666.6666666666, ans=0.0 2023-11-19 12:28:04,277 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2150, loss[loss=0.07989, simple_loss=0.09088, pruned_loss=0.02185, audio_tagging_loss=0.0126, over 14808.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1042, pruned_loss=0.02289, audio_tagging_loss=0.01051, over 3035305.59 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 32.0 2023-11-19 12:28:11,355 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735733.3333333334, ans=0.1 2023-11-19 12:28:22,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=735800.0, ans=0.125 2023-11-19 12:28:35,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=735866.6666666666, ans=0.125 2023-11-19 12:28:38,261 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:28:42,493 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=735933.3333333334, ans=0.015 2023-11-19 12:28:51,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=736000.0, ans=0.125 2023-11-19 12:28:51,795 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.41 vs. limit=22.5 2023-11-19 12:29:00,632 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2200, loss[loss=0.1226, simple_loss=0.1506, pruned_loss=0.03901, audio_tagging_loss=0.008325, over 14938.00 frames. ], tot_loss[loss=0.08628, simple_loss=0.1052, pruned_loss=0.0232, audio_tagging_loss=0.01047, over 3036754.07 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:29:06,933 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=736066.6666666666, ans=0.0 2023-11-19 12:29:07,923 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=736066.6666666666, ans=0.125 2023-11-19 12:29:11,898 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.663e+01 9.454e+01 1.034e+02 1.518e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-19 12:29:22,078 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736200.0, ans=0.125 2023-11-19 12:29:23,145 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=736200.0, ans=0.0 2023-11-19 12:29:25,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=736200.0, ans=0.09899494936611666 2023-11-19 12:29:27,388 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=736200.0, ans=0.0 2023-11-19 12:29:41,243 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=736266.6666666666, ans=0.2 2023-11-19 12:29:51,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=736333.3333333334, ans=0.2 2023-11-19 12:29:56,586 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2250, loss[loss=0.06172, simple_loss=0.0741, pruned_loss=0.01092, audio_tagging_loss=0.01375, over 14843.00 frames. ], tot_loss[loss=0.08578, simple_loss=0.1047, pruned_loss=0.02291, audio_tagging_loss=0.0105, over 3027674.62 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:30:04,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=736400.0, ans=0.05 2023-11-19 12:30:08,220 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-19 12:30:16,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=736466.6666666666, ans=0.0 2023-11-19 12:30:17,316 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=736533.3333333334, ans=0.0 2023-11-19 12:30:29,109 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=736600.0, ans=0.0 2023-11-19 12:30:29,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736600.0, ans=0.1 2023-11-19 12:30:31,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736600.0, ans=0.1 2023-11-19 12:30:36,457 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2023-11-19 12:30:37,857 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=736600.0, ans=0.0 2023-11-19 12:30:38,426 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2023-11-19 12:30:51,734 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2300, loss[loss=0.08878, simple_loss=0.1072, pruned_loss=0.0253, audio_tagging_loss=0.009898, over 15782.00 frames. ], tot_loss[loss=0.08606, simple_loss=0.1049, pruned_loss=0.02309, audio_tagging_loss=0.01053, over 3034757.39 frames. ], batch size: 60, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 12:31:03,999 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.615e+01 8.238e+01 9.177e+01 1.028e+02 1.469e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 12:31:24,904 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=736933.3333333334, ans=0.125 2023-11-19 12:31:27,987 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=736933.3333333334, ans=0.125 2023-11-19 12:31:40,614 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:31:48,023 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2350, loss[loss=0.09256, simple_loss=0.1258, pruned_loss=0.0195, audio_tagging_loss=0.01018, over 15804.00 frames. ], tot_loss[loss=0.08606, simple_loss=0.1047, pruned_loss=0.02311, audio_tagging_loss=0.01062, over 3035501.67 frames. ], batch size: 57, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 12:31:58,841 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=737133.3333333334, ans=0.04949747468305833 2023-11-19 12:32:04,095 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=737133.3333333334, ans=0.0 2023-11-19 12:32:12,369 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737200.0, ans=0.1 2023-11-19 12:32:16,529 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737200.0, ans=0.1 2023-11-19 12:32:43,224 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2400, loss[loss=0.07788, simple_loss=0.09393, pruned_loss=0.01816, audio_tagging_loss=0.01275, over 15195.00 frames. ], tot_loss[loss=0.08625, simple_loss=0.105, pruned_loss=0.02311, audio_tagging_loss=0.01066, over 3044982.90 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:32:55,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.415e+01 9.190e+01 1.007e+02 1.395e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 12:33:04,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=15.0 2023-11-19 12:33:11,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737533.3333333334, ans=0.1 2023-11-19 12:33:21,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=737600.0, ans=0.2 2023-11-19 12:33:33,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=737666.6666666666, ans=0.2 2023-11-19 12:33:38,940 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2450, loss[loss=0.0807, simple_loss=0.08692, pruned_loss=0.02028, audio_tagging_loss=0.01696, over 16229.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.1043, pruned_loss=0.02293, audio_tagging_loss=0.01075, over 3048352.35 frames. ], batch size: 63, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:33:45,838 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=12.0 2023-11-19 12:34:02,367 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=737866.6666666666, ans=0.0 2023-11-19 12:34:33,836 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2500, loss[loss=0.06958, simple_loss=0.07614, pruned_loss=0.01752, audio_tagging_loss=0.01399, over 15036.00 frames. ], tot_loss[loss=0.08611, simple_loss=0.1049, pruned_loss=0.02297, audio_tagging_loss=0.01067, over 3053664.83 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:34:37,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=738066.6666666666, ans=0.0 2023-11-19 12:34:39,794 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738066.6666666666, ans=0.1 2023-11-19 12:34:44,033 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=738133.3333333334, ans=0.0 2023-11-19 12:34:45,794 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.644e+01 9.382e+01 1.016e+02 1.260e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 12:35:28,381 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=738400.0, ans=0.125 2023-11-19 12:35:29,254 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2550, loss[loss=0.09354, simple_loss=0.1101, pruned_loss=0.02832, audio_tagging_loss=0.01018, over 14401.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.1045, pruned_loss=0.02299, audio_tagging_loss=0.01055, over 3048865.16 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:35:43,570 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2023-11-19 12:35:46,447 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738466.6666666666, ans=0.1 2023-11-19 12:36:13,351 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=738666.6666666666, ans=0.125 2023-11-19 12:36:17,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738666.6666666666, ans=0.0 2023-11-19 12:36:20,457 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=738666.6666666666, ans=0.125 2023-11-19 12:36:25,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738733.3333333334, ans=0.1 2023-11-19 12:36:26,093 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2600, loss[loss=0.08652, simple_loss=0.1003, pruned_loss=0.0237, audio_tagging_loss=0.01266, over 15723.00 frames. ], tot_loss[loss=0.08562, simple_loss=0.1043, pruned_loss=0.02301, audio_tagging_loss=0.01046, over 3048714.09 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:36:32,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=738733.3333333334, ans=0.0 2023-11-19 12:36:33,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=738733.3333333334, ans=0.0 2023-11-19 12:36:37,711 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.428e+01 9.293e+01 1.021e+02 1.415e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 12:36:59,507 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=738933.3333333334, ans=0.125 2023-11-19 12:37:08,032 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=738933.3333333334, ans=15.0 2023-11-19 12:37:14,985 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=739000.0, ans=0.125 2023-11-19 12:37:21,573 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2650, loss[loss=0.1228, simple_loss=0.1395, pruned_loss=0.04289, audio_tagging_loss=0.01012, over 15627.00 frames. ], tot_loss[loss=0.08611, simple_loss=0.1048, pruned_loss=0.02327, audio_tagging_loss=0.01042, over 3043799.62 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:37:21,793 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=739066.6666666666, ans=0.125 2023-11-19 12:37:30,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739066.6666666666, ans=0.1 2023-11-19 12:37:36,228 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-19 12:38:02,505 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=739266.6666666666, ans=0.125 2023-11-19 12:38:08,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=739333.3333333334, ans=0.0 2023-11-19 12:38:10,777 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=739333.3333333334, ans=0.0 2023-11-19 12:38:16,940 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2700, loss[loss=0.07844, simple_loss=0.09809, pruned_loss=0.02021, audio_tagging_loss=0.009183, over 15359.00 frames. ], tot_loss[loss=0.08544, simple_loss=0.104, pruned_loss=0.02304, audio_tagging_loss=0.01039, over 3045624.39 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:38:17,717 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2023-11-19 12:38:21,976 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=739400.0, ans=0.0 2023-11-19 12:38:24,065 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=739400.0, ans=0.0 2023-11-19 12:38:29,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.415e+01 9.342e+01 1.060e+02 2.991e+02, threshold=1.868e+02, percent-clipped=1.0 2023-11-19 12:38:42,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=739533.3333333334, ans=0.125 2023-11-19 12:38:48,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=739533.3333333334, ans=0.0 2023-11-19 12:39:08,575 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2023-11-19 12:39:12,475 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2750, loss[loss=0.0869, simple_loss=0.101, pruned_loss=0.02399, audio_tagging_loss=0.0124, over 15470.00 frames. ], tot_loss[loss=0.08522, simple_loss=0.1036, pruned_loss=0.02292, audio_tagging_loss=0.01052, over 3045953.64 frames. ], batch size: 58, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:39:15,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=739733.3333333334, ans=0.1 2023-11-19 12:39:27,098 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=739800.0, ans=0.125 2023-11-19 12:39:35,606 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=739866.6666666666, ans=0.1 2023-11-19 12:39:50,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2023-11-19 12:39:59,483 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:39:59,769 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=740000.0, ans=0.0 2023-11-19 12:40:08,438 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2800, loss[loss=0.1285, simple_loss=0.1633, pruned_loss=0.03895, audio_tagging_loss=0.007876, over 15794.00 frames. ], tot_loss[loss=0.08535, simple_loss=0.1041, pruned_loss=0.02284, audio_tagging_loss=0.01047, over 3044924.18 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:40:21,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.368e+01 8.840e+01 9.465e+01 1.289e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-19 12:40:29,866 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.35 vs. limit=15.0 2023-11-19 12:40:32,652 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740200.0, ans=0.125 2023-11-19 12:41:04,777 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2850, loss[loss=0.07153, simple_loss=0.09375, pruned_loss=0.01562, audio_tagging_loss=0.009038, over 14077.00 frames. ], tot_loss[loss=0.08491, simple_loss=0.1036, pruned_loss=0.02272, audio_tagging_loss=0.01039, over 3045647.40 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:41:15,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=740466.6666666666, ans=0.125 2023-11-19 12:41:18,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=740466.6666666666, ans=0.125 2023-11-19 12:41:24,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=740466.6666666666, ans=0.015 2023-11-19 12:41:29,964 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2023-11-19 12:41:46,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740600.0, ans=0.1 2023-11-19 12:41:54,487 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2023-11-19 12:42:00,347 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2900, loss[loss=0.09456, simple_loss=0.1129, pruned_loss=0.02906, audio_tagging_loss=0.009063, over 14245.00 frames. ], tot_loss[loss=0.08464, simple_loss=0.1031, pruned_loss=0.02261, audio_tagging_loss=0.0105, over 3039711.48 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:42:04,711 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=740733.3333333334, ans=0.0 2023-11-19 12:42:12,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.633e+01 9.349e+01 9.901e+01 1.327e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 12:42:32,893 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=15.0 2023-11-19 12:42:55,860 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 2950, loss[loss=0.07589, simple_loss=0.09175, pruned_loss=0.01823, audio_tagging_loss=0.01179, over 15724.00 frames. ], tot_loss[loss=0.08587, simple_loss=0.1051, pruned_loss=0.02298, audio_tagging_loss=0.01036, over 3042525.58 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:43:24,030 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=741200.0, ans=0.125 2023-11-19 12:43:26,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=741200.0, ans=0.2 2023-11-19 12:43:35,839 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=741266.6666666666, ans=0.0 2023-11-19 12:43:39,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=741333.3333333334, ans=0.125 2023-11-19 12:43:52,231 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3000, loss[loss=0.08617, simple_loss=0.0962, pruned_loss=0.02573, audio_tagging_loss=0.01234, over 14937.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1059, pruned_loss=0.02324, audio_tagging_loss=0.0104, over 3045407.00 frames. ], batch size: 55, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:43:52,232 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 12:44:08,034 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8907, 5.6876, 5.3169, 5.4736], device='cuda:3') 2023-11-19 12:44:24,105 INFO [train_asr.py:1147] (3/4) Epoch 10, validation: loss=0.06403, simple_loss=0.05543, pruned_loss=0.006395, audio_tagging_loss=0.02992, over 4681554.00 frames. 2023-11-19 12:44:24,105 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 12:44:25,614 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2023-11-19 12:44:28,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=741400.0, ans=0.0 2023-11-19 12:44:35,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.397e+01 8.400e+01 9.242e+01 1.017e+02 1.416e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 12:44:54,023 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=741533.3333333334, ans=0.125 2023-11-19 12:45:13,723 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:45:19,200 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3050, loss[loss=0.09022, simple_loss=0.111, pruned_loss=0.02436, audio_tagging_loss=0.01036, over 14633.00 frames. ], tot_loss[loss=0.08721, simple_loss=0.1065, pruned_loss=0.02348, audio_tagging_loss=0.01046, over 3046712.21 frames. ], batch size: 55, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:45:43,205 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=741866.6666666666, ans=0.125 2023-11-19 12:45:49,446 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=741866.6666666666, ans=0.125 2023-11-19 12:45:51,382 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:45:59,135 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=741933.3333333334, ans=0.125 2023-11-19 12:46:14,658 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3100, loss[loss=0.0675, simple_loss=0.07637, pruned_loss=0.0153, audio_tagging_loss=0.01401, over 15650.00 frames. ], tot_loss[loss=0.08806, simple_loss=0.1075, pruned_loss=0.02387, audio_tagging_loss=0.01045, over 3051731.18 frames. ], batch size: 58, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:46:26,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.703e+01 9.646e+01 1.063e+02 1.410e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 12:46:29,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=742133.3333333334, ans=0.0 2023-11-19 12:46:30,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=742133.3333333334, ans=0.125 2023-11-19 12:47:10,460 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3150, loss[loss=0.1024, simple_loss=0.1328, pruned_loss=0.02744, audio_tagging_loss=0.008568, over 15896.00 frames. ], tot_loss[loss=0.0888, simple_loss=0.1085, pruned_loss=0.02415, audio_tagging_loss=0.01042, over 3053986.10 frames. ], batch size: 59, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 12:47:17,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=742400.0, ans=0.125 2023-11-19 12:47:28,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=742466.6666666666, ans=0.125 2023-11-19 12:47:29,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=742466.6666666666, ans=0.0 2023-11-19 12:47:33,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=742533.3333333334, ans=0.0 2023-11-19 12:47:52,172 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=742600.0, ans=0.0 2023-11-19 12:48:04,202 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=742666.6666666666, ans=0.125 2023-11-19 12:48:06,057 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3200, loss[loss=0.09329, simple_loss=0.1174, pruned_loss=0.02329, audio_tagging_loss=0.01129, over 14822.00 frames. ], tot_loss[loss=0.08825, simple_loss=0.1078, pruned_loss=0.02379, audio_tagging_loss=0.01058, over 3061731.09 frames. ], batch size: 54, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:48:16,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=742733.3333333334, ans=6.0 2023-11-19 12:48:18,703 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=742800.0, ans=0.2 2023-11-19 12:48:20,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.457e+01 8.995e+01 9.869e+01 1.157e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 12:48:20,996 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.98 vs. limit=22.5 2023-11-19 12:48:44,749 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2023-11-19 12:48:53,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=743000.0, ans=0.0 2023-11-19 12:49:02,760 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3250, loss[loss=0.07326, simple_loss=0.0952, pruned_loss=0.01711, audio_tagging_loss=0.008553, over 16059.00 frames. ], tot_loss[loss=0.08719, simple_loss=0.1064, pruned_loss=0.02329, audio_tagging_loss=0.0107, over 3061411.58 frames. ], batch size: 62, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:49:09,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743066.6666666666, ans=0.1 2023-11-19 12:49:11,384 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=743066.6666666666, ans=0.04949747468305833 2023-11-19 12:49:24,850 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.15 vs. limit=6.0 2023-11-19 12:49:30,136 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-19 12:49:49,175 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=743333.3333333334, ans=0.2 2023-11-19 12:49:50,834 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2023-11-19 12:49:52,526 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=743333.3333333334, ans=0.125 2023-11-19 12:49:57,961 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3300, loss[loss=0.07065, simple_loss=0.08509, pruned_loss=0.01605, audio_tagging_loss=0.01206, over 16244.00 frames. ], tot_loss[loss=0.08764, simple_loss=0.107, pruned_loss=0.02354, audio_tagging_loss=0.01062, over 3064828.06 frames. ], batch size: 62, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:50:01,372 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=743400.0, ans=0.05 2023-11-19 12:50:10,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.235e+01 8.992e+01 9.663e+01 1.572e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 12:50:16,635 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=743466.6666666666, ans=0.0 2023-11-19 12:50:21,586 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-19 12:50:35,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=743600.0, ans=0.0 2023-11-19 12:50:42,531 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=743666.6666666666, ans=0.125 2023-11-19 12:50:49,004 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-19 12:50:52,828 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3350, loss[loss=0.07607, simple_loss=0.09985, pruned_loss=0.01844, audio_tagging_loss=0.007707, over 16384.00 frames. ], tot_loss[loss=0.08708, simple_loss=0.1069, pruned_loss=0.02327, audio_tagging_loss=0.01039, over 3069067.49 frames. ], batch size: 60, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 12:50:56,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=743733.3333333334, ans=0.2 2023-11-19 12:50:59,127 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.27 vs. limit=10.0 2023-11-19 12:51:02,362 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=743733.3333333334, ans=0.125 2023-11-19 12:51:14,664 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=743866.6666666666, ans=0.2 2023-11-19 12:51:30,063 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=743933.3333333334, ans=0.125 2023-11-19 12:51:33,162 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=743933.3333333334, ans=0.0 2023-11-19 12:51:35,298 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=743933.3333333334, ans=0.125 2023-11-19 12:51:48,121 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=744066.6666666666, ans=0.125 2023-11-19 12:51:49,086 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3400, loss[loss=0.06638, simple_loss=0.07626, pruned_loss=0.01866, audio_tagging_loss=0.009593, over 14318.00 frames. ], tot_loss[loss=0.08668, simple_loss=0.1063, pruned_loss=0.02323, audio_tagging_loss=0.01028, over 3063363.73 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:51:56,268 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=744066.6666666666, ans=0.125 2023-11-19 12:51:58,436 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=744066.6666666666, ans=0.125 2023-11-19 12:52:04,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.485e+01 9.102e+01 1.010e+02 1.792e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 12:52:08,525 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:52:14,819 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=744200.0, ans=0.0 2023-11-19 12:52:17,328 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=12.0 2023-11-19 12:52:29,969 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2023-11-19 12:52:37,617 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=744333.3333333334, ans=0.125 2023-11-19 12:52:43,205 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=12.0 2023-11-19 12:52:45,405 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3450, loss[loss=0.0799, simple_loss=0.1025, pruned_loss=0.01895, audio_tagging_loss=0.009706, over 14985.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1061, pruned_loss=0.02312, audio_tagging_loss=0.01028, over 3053988.85 frames. ], batch size: 58, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:52:48,725 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744400.0, ans=0.1 2023-11-19 12:52:50,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.83 vs. limit=10.0 2023-11-19 12:52:58,290 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=744466.6666666666, ans=0.0 2023-11-19 12:53:05,701 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-19 12:53:08,101 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2023-11-19 12:53:15,666 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2023-11-19 12:53:23,282 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=744600.0, ans=0.2 2023-11-19 12:53:40,479 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3500, loss[loss=0.09404, simple_loss=0.1173, pruned_loss=0.02372, audio_tagging_loss=0.01168, over 16401.00 frames. ], tot_loss[loss=0.08676, simple_loss=0.1066, pruned_loss=0.02326, audio_tagging_loss=0.0102, over 3054093.55 frames. ], batch size: 62, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:53:42,861 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=744733.3333333334, ans=0.1 2023-11-19 12:53:55,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.325e+01 9.039e+01 9.791e+01 1.416e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 12:54:08,666 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:54:13,872 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744933.3333333334, ans=0.1 2023-11-19 12:54:16,125 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.96 vs. limit=22.5 2023-11-19 12:54:24,307 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=745000.0, ans=0.0 2023-11-19 12:54:27,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=745000.0, ans=0.0 2023-11-19 12:54:36,225 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2023-11-19 12:54:36,801 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3550, loss[loss=0.09446, simple_loss=0.1135, pruned_loss=0.0247, audio_tagging_loss=0.01301, over 15681.00 frames. ], tot_loss[loss=0.08636, simple_loss=0.1061, pruned_loss=0.02315, audio_tagging_loss=0.01015, over 3053815.38 frames. ], batch size: 59, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:54:57,518 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2023-11-19 12:55:03,687 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-19 12:55:13,948 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=745266.6666666666, ans=0.0 2023-11-19 12:55:25,770 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:55:31,986 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3600, loss[loss=0.07055, simple_loss=0.08246, pruned_loss=0.01596, audio_tagging_loss=0.01337, over 14273.00 frames. ], tot_loss[loss=0.08636, simple_loss=0.1059, pruned_loss=0.02321, audio_tagging_loss=0.0102, over 3049294.92 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:55:39,680 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=745400.0, ans=0.0 2023-11-19 12:55:42,982 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.93 vs. limit=10.0 2023-11-19 12:55:44,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=745466.6666666666, ans=0.1 2023-11-19 12:55:45,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=745466.6666666666, ans=0.125 2023-11-19 12:55:46,773 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.231e+01 8.451e+01 8.878e+01 9.732e+01 1.759e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-19 12:55:51,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=745466.6666666666, ans=0.0 2023-11-19 12:56:04,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=745533.3333333334, ans=0.1 2023-11-19 12:56:27,475 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3650, loss[loss=0.08808, simple_loss=0.1089, pruned_loss=0.02238, audio_tagging_loss=0.01123, over 16053.00 frames. ], tot_loss[loss=0.08553, simple_loss=0.1049, pruned_loss=0.02288, audio_tagging_loss=0.01023, over 3049728.12 frames. ], batch size: 60, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:56:30,897 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=745733.3333333334, ans=0.0 2023-11-19 12:56:45,576 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745800.0, ans=0.1 2023-11-19 12:56:47,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=745800.0, ans=0.125 2023-11-19 12:56:58,915 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=745866.6666666666, ans=0.0 2023-11-19 12:57:02,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=745933.3333333334, ans=0.2 2023-11-19 12:57:17,822 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=746000.0, ans=0.2 2023-11-19 12:57:23,635 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3700, loss[loss=0.0794, simple_loss=0.09928, pruned_loss=0.02136, audio_tagging_loss=0.008405, over 15050.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.1052, pruned_loss=0.02303, audio_tagging_loss=0.01021, over 3052497.46 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:57:37,612 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.774e+01 9.543e+01 1.094e+02 1.492e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-19 12:57:48,502 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=746200.0, ans=0.125 2023-11-19 12:58:04,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746266.6666666666, ans=0.1 2023-11-19 12:58:11,353 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.26 vs. limit=15.0 2023-11-19 12:58:19,028 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3750, loss[loss=0.1023, simple_loss=0.128, pruned_loss=0.02887, audio_tagging_loss=0.009449, over 15584.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1052, pruned_loss=0.023, audio_tagging_loss=0.01022, over 3056385.28 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 12:58:26,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=746400.0, ans=0.2 2023-11-19 12:58:38,691 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2023-11-19 12:58:42,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=746533.3333333334, ans=0.0 2023-11-19 12:58:56,952 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:58:58,207 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=746600.0, ans=0.125 2023-11-19 12:58:59,191 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746600.0, ans=0.1 2023-11-19 12:59:10,862 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746666.6666666666, ans=0.1 2023-11-19 12:59:13,485 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:59:15,944 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.39 vs. limit=15.0 2023-11-19 12:59:17,447 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3800, loss[loss=0.09695, simple_loss=0.1152, pruned_loss=0.02916, audio_tagging_loss=0.01019, over 15841.00 frames. ], tot_loss[loss=0.0855, simple_loss=0.1047, pruned_loss=0.02284, audio_tagging_loss=0.01031, over 3059748.60 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 12:59:32,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.389e+01 8.999e+01 1.009e+02 1.684e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 12:59:35,561 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=746800.0, ans=0.125 2023-11-19 12:59:39,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746866.6666666666, ans=0.1 2023-11-19 12:59:46,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746866.6666666666, ans=0.1 2023-11-19 12:59:54,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=746933.3333333334, ans=0.125 2023-11-19 13:00:13,361 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3850, loss[loss=0.09499, simple_loss=0.1186, pruned_loss=0.02716, audio_tagging_loss=0.008529, over 14397.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1039, pruned_loss=0.02262, audio_tagging_loss=0.01048, over 3054666.93 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:00:15,789 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=747066.6666666666, ans=0.0 2023-11-19 13:00:32,037 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=747133.3333333334, ans=0.2 2023-11-19 13:00:34,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-19 13:00:36,230 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=747200.0, ans=0.125 2023-11-19 13:01:08,405 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3900, loss[loss=0.0905, simple_loss=0.1169, pruned_loss=0.02262, audio_tagging_loss=0.00941, over 15147.00 frames. ], tot_loss[loss=0.08578, simple_loss=0.105, pruned_loss=0.02287, audio_tagging_loss=0.01039, over 3051757.46 frames. ], batch size: 54, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:01:10,835 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=747400.0, ans=0.125 2023-11-19 13:01:13,368 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2023-11-19 13:01:23,249 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.736e+01 9.302e+01 1.013e+02 3.038e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-19 13:01:28,889 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=747466.6666666666, ans=0.2 2023-11-19 13:01:37,223 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=747533.3333333334, ans=0.125 2023-11-19 13:01:38,340 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=747533.3333333334, ans=0.1 2023-11-19 13:01:41,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=747600.0, ans=0.125 2023-11-19 13:01:41,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=747600.0, ans=0.125 2023-11-19 13:01:44,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=747600.0, ans=0.125 2023-11-19 13:01:48,461 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=10.0 2023-11-19 13:02:04,251 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 3950, loss[loss=0.0946, simple_loss=0.1168, pruned_loss=0.02764, audio_tagging_loss=0.00854, over 15386.00 frames. ], tot_loss[loss=0.0864, simple_loss=0.1058, pruned_loss=0.02292, audio_tagging_loss=0.01055, over 3053814.27 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:02:12,408 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=747733.3333333334, ans=0.04949747468305833 2023-11-19 13:02:16,087 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=747800.0, ans=0.0 2023-11-19 13:02:24,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=747800.0, ans=0.1 2023-11-19 13:02:26,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=747866.6666666666, ans=0.125 2023-11-19 13:02:32,300 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=747866.6666666666, ans=0.5 2023-11-19 13:02:48,229 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-11-19 13:03:01,255 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4000, loss[loss=0.08818, simple_loss=0.09558, pruned_loss=0.02899, audio_tagging_loss=0.0114, over 13957.00 frames. ], tot_loss[loss=0.08642, simple_loss=0.1059, pruned_loss=0.02296, audio_tagging_loss=0.01052, over 3053488.97 frames. ], batch size: 54, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:03:03,975 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.36 vs. limit=22.5 2023-11-19 13:03:05,720 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748066.6666666666, ans=0.1 2023-11-19 13:03:14,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.515e+01 9.235e+01 1.023e+02 1.834e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 13:03:56,341 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4050, loss[loss=0.1074, simple_loss=0.1331, pruned_loss=0.03178, audio_tagging_loss=0.009094, over 15622.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.1057, pruned_loss=0.02297, audio_tagging_loss=0.01058, over 3052508.00 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:03:58,504 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:04:17,179 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748533.3333333334, ans=0.1 2023-11-19 13:04:35,167 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=748600.0, ans=0.2 2023-11-19 13:04:43,901 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=12.0 2023-11-19 13:04:51,015 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.55 vs. limit=22.5 2023-11-19 13:04:51,375 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4100, loss[loss=0.09191, simple_loss=0.111, pruned_loss=0.02654, audio_tagging_loss=0.009872, over 15056.00 frames. ], tot_loss[loss=0.08621, simple_loss=0.1053, pruned_loss=0.02285, audio_tagging_loss=0.0107, over 3052175.16 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:04:53,064 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2023-11-19 13:04:54,788 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=748733.3333333334, ans=0.07 2023-11-19 13:04:56,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748733.3333333334, ans=0.1 2023-11-19 13:04:59,376 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2023-11-19 13:05:06,275 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 8.382e+01 9.039e+01 9.605e+01 1.210e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 13:05:12,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=748866.6666666666, ans=0.125 2023-11-19 13:05:14,953 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=748866.6666666666, ans=0.0 2023-11-19 13:05:17,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=748866.6666666666, ans=0.1 2023-11-19 13:05:35,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=749000.0, ans=0.2 2023-11-19 13:05:40,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=749000.0, ans=0.07 2023-11-19 13:05:46,712 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4150, loss[loss=0.09874, simple_loss=0.1206, pruned_loss=0.02928, audio_tagging_loss=0.009159, over 15071.00 frames. ], tot_loss[loss=0.08596, simple_loss=0.1048, pruned_loss=0.02295, audio_tagging_loss=0.0106, over 3049119.52 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:05:46,991 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=749066.6666666666, ans=0.0 2023-11-19 13:05:48,052 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=749066.6666666666, ans=0.09899494936611666 2023-11-19 13:05:49,129 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=749066.6666666666, ans=0.125 2023-11-19 13:05:54,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=749066.6666666666, ans=0.125 2023-11-19 13:06:04,902 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=749133.3333333334, ans=0.125 2023-11-19 13:06:11,881 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749200.0, ans=0.1 2023-11-19 13:06:25,582 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:06:41,686 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4200, loss[loss=0.07608, simple_loss=0.09045, pruned_loss=0.01784, audio_tagging_loss=0.01301, over 15137.00 frames. ], tot_loss[loss=0.08595, simple_loss=0.1051, pruned_loss=0.02288, audio_tagging_loss=0.01053, over 3051351.59 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:06:55,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.337e+01 9.071e+01 1.010e+02 1.242e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 13:06:59,512 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2023-11-19 13:07:26,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=749666.6666666666, ans=0.125 2023-11-19 13:07:31,143 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=749666.6666666666, ans=6.0 2023-11-19 13:07:37,278 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4250, loss[loss=0.08397, simple_loss=0.1066, pruned_loss=0.02117, audio_tagging_loss=0.009519, over 15348.00 frames. ], tot_loss[loss=0.08634, simple_loss=0.1061, pruned_loss=0.02285, audio_tagging_loss=0.01043, over 3053209.77 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:07:50,284 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=749800.0, ans=0.0 2023-11-19 13:07:59,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=749866.6666666666, ans=0.07 2023-11-19 13:07:59,334 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=749866.6666666666, ans=0.125 2023-11-19 13:08:20,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=750000.0, ans=0.0 2023-11-19 13:08:32,534 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4300, loss[loss=0.09443, simple_loss=0.1193, pruned_loss=0.02471, audio_tagging_loss=0.01009, over 15240.00 frames. ], tot_loss[loss=0.08658, simple_loss=0.1064, pruned_loss=0.02306, audio_tagging_loss=0.0103, over 3054311.19 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:08:47,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.411e+01 9.252e+01 1.004e+02 1.296e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 13:08:48,049 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=750133.3333333334, ans=0.125 2023-11-19 13:09:11,746 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=750266.6666666666, ans=0.07 2023-11-19 13:09:12,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=750266.6666666666, ans=0.125 2023-11-19 13:09:28,938 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4350, loss[loss=0.0616, simple_loss=0.06863, pruned_loss=0.01167, audio_tagging_loss=0.01561, over 14843.00 frames. ], tot_loss[loss=0.0863, simple_loss=0.1061, pruned_loss=0.02296, audio_tagging_loss=0.01028, over 3059216.29 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 13:09:44,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=750466.6666666666, ans=0.04949747468305833 2023-11-19 13:09:48,698 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:09:51,772 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=750533.3333333334, ans=0.2 2023-11-19 13:10:05,319 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=750600.0, ans=0.0 2023-11-19 13:10:19,331 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=750666.6666666666, ans=0.0 2023-11-19 13:10:24,883 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4400, loss[loss=0.1269, simple_loss=0.153, pruned_loss=0.04009, audio_tagging_loss=0.01035, over 16045.00 frames. ], tot_loss[loss=0.08672, simple_loss=0.1067, pruned_loss=0.02321, audio_tagging_loss=0.01018, over 3057470.56 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:10:37,966 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:10:39,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.801e+01 8.194e+01 9.026e+01 9.942e+01 1.275e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 13:10:52,756 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2023-11-19 13:10:57,918 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=750933.3333333334, ans=0.0 2023-11-19 13:11:04,425 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=750933.3333333334, ans=0.125 2023-11-19 13:11:20,612 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4450, loss[loss=0.06672, simple_loss=0.07595, pruned_loss=0.01869, audio_tagging_loss=0.01005, over 15895.00 frames. ], tot_loss[loss=0.08646, simple_loss=0.1062, pruned_loss=0.02318, audio_tagging_loss=0.01016, over 3056440.22 frames. ], batch size: 61, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:11:39,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=751133.3333333334, ans=0.125 2023-11-19 13:11:39,780 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=12.0 2023-11-19 13:11:40,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=751133.3333333334, ans=0.0 2023-11-19 13:11:47,597 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2023-11-19 13:11:57,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=751266.6666666666, ans=0.1 2023-11-19 13:12:16,423 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4500, loss[loss=0.06311, simple_loss=0.08607, pruned_loss=0.01263, audio_tagging_loss=0.007435, over 13928.00 frames. ], tot_loss[loss=0.086, simple_loss=0.1057, pruned_loss=0.02293, audio_tagging_loss=0.01023, over 3059255.42 frames. ], batch size: 52, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:12:32,376 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.202e+01 9.163e+01 9.982e+01 1.315e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 13:13:04,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=751666.6666666666, ans=0.025 2023-11-19 13:13:06,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=751666.6666666666, ans=0.1 2023-11-19 13:13:10,108 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=751666.6666666666, ans=0.0 2023-11-19 13:13:12,552 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4550, loss[loss=0.1174, simple_loss=0.1508, pruned_loss=0.03347, audio_tagging_loss=0.008566, over 15024.00 frames. ], tot_loss[loss=0.08537, simple_loss=0.1052, pruned_loss=0.02261, audio_tagging_loss=0.01017, over 3051542.58 frames. ], batch size: 54, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:13:21,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=751733.3333333334, ans=0.125 2023-11-19 13:13:36,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=751866.6666666666, ans=0.0 2023-11-19 13:13:54,737 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:14:02,719 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=752000.0, ans=0.125 2023-11-19 13:14:07,798 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4600, loss[loss=0.07378, simple_loss=0.07898, pruned_loss=0.02178, audio_tagging_loss=0.0125, over 14249.00 frames. ], tot_loss[loss=0.08534, simple_loss=0.1047, pruned_loss=0.02269, audio_tagging_loss=0.01028, over 3049419.29 frames. ], batch size: 55, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:14:07,921 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752066.6666666666, ans=0.1 2023-11-19 13:14:13,917 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=752066.6666666666, ans=0.125 2023-11-19 13:14:21,962 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=752133.3333333334, ans=0.125 2023-11-19 13:14:23,753 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 8.092e+01 8.852e+01 9.685e+01 1.325e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 13:14:24,061 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=752133.3333333334, ans=0.0 2023-11-19 13:14:31,970 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=752200.0, ans=0.2 2023-11-19 13:14:52,544 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=752333.3333333334, ans=0.125 2023-11-19 13:14:57,714 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2023-11-19 13:15:04,122 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4650, loss[loss=0.07579, simple_loss=0.09057, pruned_loss=0.0177, audio_tagging_loss=0.01281, over 16031.00 frames. ], tot_loss[loss=0.08608, simple_loss=0.1056, pruned_loss=0.02292, audio_tagging_loss=0.01035, over 3050851.61 frames. ], batch size: 61, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:15:16,609 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=752466.6666666666, ans=0.125 2023-11-19 13:15:34,951 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=752533.3333333334, ans=0.0 2023-11-19 13:15:37,100 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=752600.0, ans=0.025 2023-11-19 13:15:59,523 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4700, loss[loss=0.1131, simple_loss=0.1332, pruned_loss=0.0365, audio_tagging_loss=0.009937, over 15348.00 frames. ], tot_loss[loss=0.0872, simple_loss=0.1066, pruned_loss=0.02342, audio_tagging_loss=0.01047, over 3058035.98 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:16:09,745 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=752800.0, ans=0.125 2023-11-19 13:16:14,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.706e+01 9.585e+01 1.066e+02 1.440e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-19 13:16:16,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=752800.0, ans=0.125 2023-11-19 13:16:26,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=752866.6666666666, ans=0.125 2023-11-19 13:16:47,075 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.10 vs. limit=6.0 2023-11-19 13:16:48,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=753000.0, ans=0.125 2023-11-19 13:16:49,905 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=753000.0, ans=0.2 2023-11-19 13:16:54,894 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4750, loss[loss=0.07574, simple_loss=0.08142, pruned_loss=0.02081, audio_tagging_loss=0.01421, over 14164.00 frames. ], tot_loss[loss=0.08646, simple_loss=0.1052, pruned_loss=0.02321, audio_tagging_loss=0.01063, over 3054956.04 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:16:58,326 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=753066.6666666666, ans=0.0 2023-11-19 13:16:58,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=753066.6666666666, ans=0.0 2023-11-19 13:17:07,351 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2023-11-19 13:17:42,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=753333.3333333334, ans=0.1 2023-11-19 13:17:51,340 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4800, loss[loss=0.0833, simple_loss=0.1095, pruned_loss=0.02, audio_tagging_loss=0.008533, over 15959.00 frames. ], tot_loss[loss=0.08644, simple_loss=0.105, pruned_loss=0.02317, audio_tagging_loss=0.01075, over 3058609.83 frames. ], batch size: 59, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:17:54,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=753400.0, ans=0.1 2023-11-19 13:18:06,144 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.271e+01 9.115e+01 1.017e+02 1.442e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 13:18:14,400 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=753533.3333333334, ans=0.125 2023-11-19 13:18:15,858 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.14 vs. limit=15.0 2023-11-19 13:18:27,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=753600.0, ans=0.05 2023-11-19 13:18:31,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=753600.0, ans=0.0 2023-11-19 13:18:37,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=753666.6666666666, ans=0.125 2023-11-19 13:18:45,799 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4850, loss[loss=0.097, simple_loss=0.1264, pruned_loss=0.02456, audio_tagging_loss=0.009238, over 15318.00 frames. ], tot_loss[loss=0.08686, simple_loss=0.1055, pruned_loss=0.02326, audio_tagging_loss=0.01084, over 3055595.86 frames. ], batch size: 55, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:19:12,755 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=22.5 2023-11-19 13:19:13,658 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=753866.6666666666, ans=0.5 2023-11-19 13:19:25,564 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.50 vs. limit=15.0 2023-11-19 13:19:29,216 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=754000.0, ans=0.125 2023-11-19 13:19:35,705 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=754000.0, ans=0.07 2023-11-19 13:19:39,988 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=754000.0, ans=0.125 2023-11-19 13:19:41,816 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4900, loss[loss=0.08752, simple_loss=0.1078, pruned_loss=0.0237, audio_tagging_loss=0.009914, over 14230.00 frames. ], tot_loss[loss=0.08634, simple_loss=0.1052, pruned_loss=0.02304, audio_tagging_loss=0.0107, over 3052472.19 frames. ], batch size: 54, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:19:57,513 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.439e+01 8.307e+01 9.002e+01 9.755e+01 1.261e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 13:20:05,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=754200.0, ans=0.05 2023-11-19 13:20:06,863 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=8.0 2023-11-19 13:20:24,177 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=754266.6666666666, ans=0.125 2023-11-19 13:20:37,466 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 4950, loss[loss=0.07731, simple_loss=0.09423, pruned_loss=0.02004, audio_tagging_loss=0.01015, over 15088.00 frames. ], tot_loss[loss=0.08586, simple_loss=0.105, pruned_loss=0.02288, audio_tagging_loss=0.01047, over 3051832.63 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:21:03,641 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.22 vs. limit=10.0 2023-11-19 13:21:09,653 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=754600.0, ans=0.09899494936611666 2023-11-19 13:21:12,833 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=754600.0, ans=0.125 2023-11-19 13:21:32,189 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-19 13:21:32,755 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5000, loss[loss=0.0602, simple_loss=0.07191, pruned_loss=0.0139, audio_tagging_loss=0.01034, over 15545.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.1054, pruned_loss=0.02288, audio_tagging_loss=0.01032, over 3057696.39 frames. ], batch size: 60, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:21:37,785 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=754733.3333333334, ans=0.0 2023-11-19 13:21:48,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.260e+01 8.986e+01 1.009e+02 1.320e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 13:22:00,472 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.95 vs. limit=12.0 2023-11-19 13:22:01,560 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754866.6666666666, ans=0.1 2023-11-19 13:22:13,978 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=754933.3333333334, ans=0.125 2023-11-19 13:22:22,984 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=755000.0, ans=0.125 2023-11-19 13:22:28,174 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5050, loss[loss=0.09966, simple_loss=0.121, pruned_loss=0.02932, audio_tagging_loss=0.009842, over 15011.00 frames. ], tot_loss[loss=0.08602, simple_loss=0.1061, pruned_loss=0.02286, audio_tagging_loss=0.01012, over 3055662.02 frames. ], batch size: 55, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:22:28,626 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-19 13:22:38,912 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755133.3333333334, ans=0.1 2023-11-19 13:22:40,075 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=755133.3333333334, ans=0.2 2023-11-19 13:22:43,718 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=755133.3333333334, ans=0.0 2023-11-19 13:22:50,650 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.98 vs. limit=12.0 2023-11-19 13:23:04,350 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=755266.6666666666, ans=0.1 2023-11-19 13:23:14,673 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=755333.3333333334, ans=10.0 2023-11-19 13:23:15,767 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=755333.3333333334, ans=0.125 2023-11-19 13:23:15,811 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=755333.3333333334, ans=0.125 2023-11-19 13:23:23,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=755400.0, ans=0.125 2023-11-19 13:23:24,485 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5100, loss[loss=0.05799, simple_loss=0.06625, pruned_loss=0.01292, audio_tagging_loss=0.01194, over 14972.00 frames. ], tot_loss[loss=0.08559, simple_loss=0.1051, pruned_loss=0.02281, audio_tagging_loss=0.01021, over 3050188.49 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:23:29,391 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=755400.0, ans=0.0 2023-11-19 13:23:39,694 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.354e+01 8.218e+01 8.903e+01 1.035e+02 1.339e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 13:23:39,931 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=755466.6666666666, ans=0.025 2023-11-19 13:23:56,990 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755600.0, ans=0.1 2023-11-19 13:23:59,194 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=755600.0, ans=0.0 2023-11-19 13:24:02,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755600.0, ans=0.1 2023-11-19 13:24:04,430 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.99 vs. limit=22.5 2023-11-19 13:24:20,065 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5150, loss[loss=0.08222, simple_loss=0.1018, pruned_loss=0.02267, audio_tagging_loss=0.00867, over 15025.00 frames. ], tot_loss[loss=0.08538, simple_loss=0.1049, pruned_loss=0.02271, audio_tagging_loss=0.01024, over 3046303.76 frames. ], batch size: 55, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:24:22,895 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2023-11-19 13:24:59,338 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=755933.3333333334, ans=0.2 2023-11-19 13:25:07,240 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=756000.0, ans=0.2 2023-11-19 13:25:15,941 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5200, loss[loss=0.07754, simple_loss=0.1013, pruned_loss=0.01874, audio_tagging_loss=0.008154, over 15701.00 frames. ], tot_loss[loss=0.08583, simple_loss=0.1051, pruned_loss=0.02303, audio_tagging_loss=0.01024, over 3050140.34 frames. ], batch size: 60, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:25:18,221 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=756066.6666666666, ans=0.125 2023-11-19 13:25:25,055 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756066.6666666666, ans=0.125 2023-11-19 13:25:31,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.473e+01 9.085e+01 1.039e+02 1.273e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 13:25:33,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756133.3333333334, ans=0.1 2023-11-19 13:25:44,151 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=756200.0, ans=0.125 2023-11-19 13:26:05,206 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.41 vs. limit=6.0 2023-11-19 13:26:11,821 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5250, loss[loss=0.06034, simple_loss=0.06975, pruned_loss=0.01187, audio_tagging_loss=0.01359, over 14564.00 frames. ], tot_loss[loss=0.08563, simple_loss=0.1051, pruned_loss=0.02282, audio_tagging_loss=0.01027, over 3051059.62 frames. ], batch size: 56, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:26:30,496 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=756466.6666666666, ans=0.0 2023-11-19 13:26:44,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=756600.0, ans=0.125 2023-11-19 13:26:47,028 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=756600.0, ans=0.125 2023-11-19 13:26:50,183 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=756600.0, ans=0.0 2023-11-19 13:26:54,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=756600.0, ans=0.125 2023-11-19 13:26:54,665 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=22.5 2023-11-19 13:27:07,156 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5300, loss[loss=0.08368, simple_loss=0.1005, pruned_loss=0.02164, audio_tagging_loss=0.01179, over 15499.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1047, pruned_loss=0.02277, audio_tagging_loss=0.01033, over 3054552.54 frames. ], batch size: 57, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:27:09,016 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-19 13:27:10,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=756733.3333333334, ans=0.0 2023-11-19 13:27:12,747 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=756733.3333333334, ans=0.5 2023-11-19 13:27:22,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.251e+01 9.151e+01 1.020e+02 1.250e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 13:27:29,212 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=756866.6666666666, ans=0.0 2023-11-19 13:27:29,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-19 13:27:36,572 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:27:49,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=756933.3333333334, ans=0.125 2023-11-19 13:27:54,001 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=757000.0, ans=0.125 2023-11-19 13:28:02,882 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5350, loss[loss=0.113, simple_loss=0.1371, pruned_loss=0.0358, audio_tagging_loss=0.008689, over 14569.00 frames. ], tot_loss[loss=0.0857, simple_loss=0.1048, pruned_loss=0.02289, audio_tagging_loss=0.01039, over 3049946.72 frames. ], batch size: 55, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:28:05,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=757066.6666666666, ans=0.025 2023-11-19 13:28:10,466 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757066.6666666666, ans=0.1 2023-11-19 13:28:14,788 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:28:21,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757133.3333333334, ans=0.1 2023-11-19 13:28:31,738 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=757200.0, ans=0.125 2023-11-19 13:28:35,890 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=757266.6666666666, ans=0.2 2023-11-19 13:28:37,992 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=757266.6666666666, ans=0.0 2023-11-19 13:28:44,873 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=757266.6666666666, ans=0.125 2023-11-19 13:28:57,748 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5400, loss[loss=0.07608, simple_loss=0.09677, pruned_loss=0.01828, audio_tagging_loss=0.009415, over 15475.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.1045, pruned_loss=0.02257, audio_tagging_loss=0.01044, over 3051017.17 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:29:14,079 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.379e+01 8.876e+01 9.570e+01 1.112e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-19 13:29:29,770 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=757533.3333333334, ans=0.125 2023-11-19 13:29:30,834 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=757600.0, ans=0.0 2023-11-19 13:29:47,170 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.81 vs. limit=15.0 2023-11-19 13:29:47,632 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757666.6666666666, ans=0.1 2023-11-19 13:29:47,686 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=757666.6666666666, ans=0.125 2023-11-19 13:29:51,346 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=757666.6666666666, ans=0.2 2023-11-19 13:29:53,965 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2023-11-19 13:29:54,348 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5450, loss[loss=0.09928, simple_loss=0.1234, pruned_loss=0.02748, audio_tagging_loss=0.01011, over 15007.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.1058, pruned_loss=0.02311, audio_tagging_loss=0.0104, over 3050801.53 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:30:11,130 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=757800.0, ans=0.0 2023-11-19 13:30:18,277 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=757866.6666666666, ans=0.2 2023-11-19 13:30:18,360 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=757866.6666666666, ans=0.2 2023-11-19 13:30:22,682 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=757866.6666666666, ans=0.1 2023-11-19 13:30:23,709 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=757866.6666666666, ans=0.0 2023-11-19 13:30:29,975 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=757933.3333333334, ans=0.07 2023-11-19 13:30:49,726 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5500, loss[loss=0.06452, simple_loss=0.07918, pruned_loss=0.01295, audio_tagging_loss=0.01198, over 15125.00 frames. ], tot_loss[loss=0.08609, simple_loss=0.1056, pruned_loss=0.02287, audio_tagging_loss=0.01042, over 3052560.07 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:30:59,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758133.3333333334, ans=0.1 2023-11-19 13:31:04,522 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.323e+01 9.025e+01 9.961e+01 1.664e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 13:31:12,214 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=758200.0, ans=0.0 2023-11-19 13:31:13,273 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=758200.0, ans=0.125 2023-11-19 13:31:22,932 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=8.0 2023-11-19 13:31:27,541 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:31:30,444 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-19 13:31:31,126 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=758266.6666666666, ans=0.0 2023-11-19 13:31:36,523 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=758333.3333333334, ans=0.0 2023-11-19 13:31:44,671 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5550, loss[loss=0.09144, simple_loss=0.1089, pruned_loss=0.025, audio_tagging_loss=0.01201, over 16953.00 frames. ], tot_loss[loss=0.08596, simple_loss=0.1052, pruned_loss=0.0229, audio_tagging_loss=0.01048, over 3054927.99 frames. ], batch size: 62, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:31:46,196 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2023-11-19 13:31:48,039 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=758400.0, ans=0.125 2023-11-19 13:31:54,158 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=758400.0, ans=0.125 2023-11-19 13:32:08,676 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=758533.3333333334, ans=0.0 2023-11-19 13:32:14,375 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=758533.3333333334, ans=0.2 2023-11-19 13:32:29,506 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=758666.6666666666, ans=0.125 2023-11-19 13:32:33,441 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-19 13:32:40,815 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5600, loss[loss=0.09863, simple_loss=0.1229, pruned_loss=0.02845, audio_tagging_loss=0.008735, over 15099.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1057, pruned_loss=0.02277, audio_tagging_loss=0.0105, over 3063395.93 frames. ], batch size: 54, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:32:42,650 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=758733.3333333334, ans=0.125 2023-11-19 13:32:45,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=758733.3333333334, ans=0.0 2023-11-19 13:32:54,683 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758800.0, ans=0.1 2023-11-19 13:32:56,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.309e+01 8.349e+01 9.140e+01 1.023e+02 1.369e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 13:33:10,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=758866.6666666666, ans=0.2 2023-11-19 13:33:18,574 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:33:36,687 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5650, loss[loss=0.1074, simple_loss=0.1403, pruned_loss=0.02787, audio_tagging_loss=0.009319, over 15096.00 frames. ], tot_loss[loss=0.08665, simple_loss=0.1064, pruned_loss=0.02297, audio_tagging_loss=0.01047, over 3068335.60 frames. ], batch size: 53, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:33:42,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=759066.6666666666, ans=0.07 2023-11-19 13:33:45,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=759066.6666666666, ans=0.125 2023-11-19 13:34:14,547 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=759266.6666666666, ans=0.125 2023-11-19 13:34:27,688 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=759333.3333333334, ans=0.125 2023-11-19 13:34:31,772 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5700, loss[loss=0.08831, simple_loss=0.1142, pruned_loss=0.02338, audio_tagging_loss=0.007828, over 14805.00 frames. ], tot_loss[loss=0.08677, simple_loss=0.1064, pruned_loss=0.02305, audio_tagging_loss=0.01049, over 3062439.23 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 32.0 2023-11-19 13:34:34,598 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=759400.0, ans=6.0 2023-11-19 13:34:45,775 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=759466.6666666666, ans=0.0 2023-11-19 13:34:47,528 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.640e+01 8.102e+01 8.841e+01 9.614e+01 1.155e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-19 13:35:04,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=759600.0, ans=0.125 2023-11-19 13:35:09,074 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=15.0 2023-11-19 13:35:23,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=759666.6666666666, ans=0.2 2023-11-19 13:35:27,570 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5750, loss[loss=0.06646, simple_loss=0.07123, pruned_loss=0.01592, audio_tagging_loss=0.01493, over 15166.00 frames. ], tot_loss[loss=0.0864, simple_loss=0.1059, pruned_loss=0.02295, audio_tagging_loss=0.01053, over 3060471.88 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 32.0 2023-11-19 13:35:28,816 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=759733.3333333334, ans=0.2 2023-11-19 13:35:37,230 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2023-11-19 13:35:44,701 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=759800.0, ans=0.1 2023-11-19 13:35:57,443 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=759866.6666666666, ans=0.0 2023-11-19 13:36:22,070 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.62 vs. limit=22.5 2023-11-19 13:36:23,405 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5800, loss[loss=0.1072, simple_loss=0.1305, pruned_loss=0.03361, audio_tagging_loss=0.00833, over 15122.00 frames. ], tot_loss[loss=0.08572, simple_loss=0.1051, pruned_loss=0.02275, audio_tagging_loss=0.01041, over 3062669.49 frames. ], batch size: 56, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:36:26,709 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-19 13:36:28,456 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=760066.6666666666, ans=0.0 2023-11-19 13:36:39,822 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 8.575e+01 9.143e+01 9.990e+01 1.422e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-19 13:36:56,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760266.6666666666, ans=0.1 2023-11-19 13:37:15,163 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=760333.3333333334, ans=0.125 2023-11-19 13:37:19,218 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5850, loss[loss=0.1002, simple_loss=0.1178, pruned_loss=0.03332, audio_tagging_loss=0.007993, over 14778.00 frames. ], tot_loss[loss=0.08488, simple_loss=0.1039, pruned_loss=0.02254, audio_tagging_loss=0.01037, over 3044225.45 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:37:20,520 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=760400.0, ans=0.95 2023-11-19 13:37:25,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=760400.0, ans=0.125 2023-11-19 13:37:36,234 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=760466.6666666666, ans=0.125 2023-11-19 13:37:38,239 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=760466.6666666666, ans=0.2 2023-11-19 13:37:38,261 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=760466.6666666666, ans=0.2 2023-11-19 13:37:38,657 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.86 vs. limit=22.5 2023-11-19 13:37:41,483 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760533.3333333334, ans=0.1 2023-11-19 13:38:08,693 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=760666.6666666666, ans=0.0 2023-11-19 13:38:15,423 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5900, loss[loss=0.07682, simple_loss=0.08621, pruned_loss=0.02124, audio_tagging_loss=0.01247, over 16618.00 frames. ], tot_loss[loss=0.08593, simple_loss=0.1055, pruned_loss=0.02285, audio_tagging_loss=0.01031, over 3052725.97 frames. ], batch size: 65, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:38:32,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.376e+01 9.188e+01 1.002e+02 1.553e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 13:39:02,159 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=761000.0, ans=0.125 2023-11-19 13:39:07,371 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=761000.0, ans=0.2 2023-11-19 13:39:10,371 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 5950, loss[loss=0.105, simple_loss=0.1431, pruned_loss=0.02733, audio_tagging_loss=0.006129, over 15300.00 frames. ], tot_loss[loss=0.08536, simple_loss=0.1052, pruned_loss=0.02251, audio_tagging_loss=0.01028, over 3046757.20 frames. ], batch size: 52, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:39:11,022 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.40 vs. limit=6.0 2023-11-19 13:39:27,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 13:39:29,236 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 13:39:36,651 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=761200.0, ans=0.2 2023-11-19 13:39:40,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=761200.0, ans=0.07 2023-11-19 13:39:49,193 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=761266.6666666666, ans=0.125 2023-11-19 13:40:06,249 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6000, loss[loss=0.06895, simple_loss=0.08017, pruned_loss=0.01817, audio_tagging_loss=0.0107, over 15478.00 frames. ], tot_loss[loss=0.08495, simple_loss=0.1047, pruned_loss=0.02229, audio_tagging_loss=0.01033, over 3046866.72 frames. ], batch size: 60, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:40:06,250 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 13:40:22,619 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.1523, 2.5857, 3.9488, 2.8835], device='cuda:3') 2023-11-19 13:40:23,956 INFO [zipformer.py:1873] (3/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9088, 3.2326, 4.9006, 4.3619], device='cuda:3') 2023-11-19 13:40:38,516 INFO [train_asr.py:1147] (3/4) Epoch 10, validation: loss=0.06367, simple_loss=0.05534, pruned_loss=0.00639, audio_tagging_loss=0.02961, over 4681554.00 frames. 2023-11-19 13:40:38,517 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 13:40:39,847 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=761400.0, ans=0.2 2023-11-19 13:40:46,128 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=761400.0, ans=0.125 2023-11-19 13:40:55,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.242e+01 8.869e+01 9.811e+01 1.293e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 13:41:13,190 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=761600.0, ans=0.125 2023-11-19 13:41:17,845 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:41:27,796 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2023-11-19 13:41:34,258 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6050, loss[loss=0.06022, simple_loss=0.0679, pruned_loss=0.01498, audio_tagging_loss=0.01129, over 15686.00 frames. ], tot_loss[loss=0.08531, simple_loss=0.1051, pruned_loss=0.02249, audio_tagging_loss=0.01026, over 3046152.71 frames. ], batch size: 61, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:41:40,620 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-19 13:41:53,512 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=761800.0, ans=0.125 2023-11-19 13:42:05,809 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=761866.6666666666, ans=0.125 2023-11-19 13:42:13,150 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.362e-02 2023-11-19 13:42:22,773 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=762000.0, ans=0.125 2023-11-19 13:42:30,041 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6100, loss[loss=0.0754, simple_loss=0.09163, pruned_loss=0.01812, audio_tagging_loss=0.01147, over 14314.00 frames. ], tot_loss[loss=0.08585, simple_loss=0.1057, pruned_loss=0.0228, audio_tagging_loss=0.01018, over 3053639.84 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:42:46,878 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.906e+01 8.561e+01 9.384e+01 1.050e+02 1.586e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-19 13:42:52,829 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=22.5 2023-11-19 13:43:13,582 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=762333.3333333334, ans=0.2 2023-11-19 13:43:22,587 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=762333.3333333334, ans=0.125 2023-11-19 13:43:22,957 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2023-11-19 13:43:25,494 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.78 vs. limit=6.0 2023-11-19 13:43:25,976 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6150, loss[loss=0.08998, simple_loss=0.1237, pruned_loss=0.02169, audio_tagging_loss=0.006458, over 15636.00 frames. ], tot_loss[loss=0.08458, simple_loss=0.1037, pruned_loss=0.0224, audio_tagging_loss=0.01033, over 3048603.26 frames. ], batch size: 57, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:43:44,770 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.15 vs. limit=22.5 2023-11-19 13:43:52,189 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:44:03,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=762600.0, ans=0.0 2023-11-19 13:44:10,871 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2023-11-19 13:44:12,584 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762666.6666666666, ans=0.1 2023-11-19 13:44:12,865 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2023-11-19 13:44:21,446 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6200, loss[loss=0.08163, simple_loss=0.1207, pruned_loss=0.01373, audio_tagging_loss=0.007557, over 15134.00 frames. ], tot_loss[loss=0.08424, simple_loss=0.103, pruned_loss=0.02233, audio_tagging_loss=0.01041, over 3037352.58 frames. ], batch size: 53, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:44:25,973 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762733.3333333334, ans=0.1 2023-11-19 13:44:37,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.474e+01 9.020e+01 9.808e+01 1.345e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 13:44:42,878 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=762866.6666666666, ans=0.0 2023-11-19 13:44:45,906 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=762866.6666666666, ans=0.125 2023-11-19 13:44:56,077 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=762933.3333333334, ans=0.125 2023-11-19 13:44:59,200 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=762933.3333333334, ans=0.125 2023-11-19 13:45:10,431 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=763000.0, ans=0.2 2023-11-19 13:45:10,678 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2023-11-19 13:45:12,392 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=763000.0, ans=0.0 2023-11-19 13:45:12,432 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=763000.0, ans=0.2 2023-11-19 13:45:17,023 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6250, loss[loss=0.108, simple_loss=0.129, pruned_loss=0.03472, audio_tagging_loss=0.008756, over 14379.00 frames. ], tot_loss[loss=0.08533, simple_loss=0.1041, pruned_loss=0.02278, audio_tagging_loss=0.0105, over 3044197.40 frames. ], batch size: 53, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:45:22,519 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=763066.6666666666, ans=0.125 2023-11-19 13:45:29,341 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=763133.3333333334, ans=0.1 2023-11-19 13:46:03,828 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=15.0 2023-11-19 13:46:12,446 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6300, loss[loss=0.06943, simple_loss=0.07326, pruned_loss=0.01698, audio_tagging_loss=0.01581, over 14806.00 frames. ], tot_loss[loss=0.08488, simple_loss=0.1038, pruned_loss=0.02248, audio_tagging_loss=0.01049, over 3039634.68 frames. ], batch size: 58, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:46:18,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=763400.0, ans=0.09899494936611666 2023-11-19 13:46:29,496 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.302e+01 8.988e+01 1.019e+02 1.261e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 13:46:55,423 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=763600.0, ans=0.1 2023-11-19 13:47:08,310 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6350, loss[loss=0.05595, simple_loss=0.05191, pruned_loss=0.01608, audio_tagging_loss=0.01391, over 17793.00 frames. ], tot_loss[loss=0.08429, simple_loss=0.1029, pruned_loss=0.0223, audio_tagging_loss=0.01054, over 3042089.28 frames. ], batch size: 70, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:47:15,497 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=12.0 2023-11-19 13:47:27,594 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=763800.0, ans=0.125 2023-11-19 13:47:44,550 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=763933.3333333334, ans=0.125 2023-11-19 13:47:57,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=764000.0, ans=0.0 2023-11-19 13:48:03,973 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6400, loss[loss=0.09182, simple_loss=0.1064, pruned_loss=0.02643, audio_tagging_loss=0.01219, over 14537.00 frames. ], tot_loss[loss=0.08427, simple_loss=0.1027, pruned_loss=0.02227, audio_tagging_loss=0.01066, over 3037764.16 frames. ], batch size: 57, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:48:12,222 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=764066.6666666666, ans=0.125 2023-11-19 13:48:22,457 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 7.945e+01 8.669e+01 9.496e+01 1.172e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-19 13:48:22,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=764133.3333333334, ans=0.125 2023-11-19 13:48:40,036 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=764266.6666666666, ans=0.125 2023-11-19 13:48:40,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=764266.6666666666, ans=0.125 2023-11-19 13:48:49,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=764333.3333333334, ans=0.125 2023-11-19 13:48:59,550 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6450, loss[loss=0.08766, simple_loss=0.1132, pruned_loss=0.0216, audio_tagging_loss=0.00945, over 15581.00 frames. ], tot_loss[loss=0.08448, simple_loss=0.1031, pruned_loss=0.02226, audio_tagging_loss=0.01066, over 3040531.41 frames. ], batch size: 59, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:49:05,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=764400.0, ans=0.125 2023-11-19 13:49:30,233 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=764533.3333333334, ans=0.0 2023-11-19 13:49:33,449 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=764600.0, ans=0.07 2023-11-19 13:49:46,645 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=764666.6666666666, ans=0.0 2023-11-19 13:49:54,755 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6500, loss[loss=0.07262, simple_loss=0.08844, pruned_loss=0.01927, audio_tagging_loss=0.009134, over 15296.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.1029, pruned_loss=0.02218, audio_tagging_loss=0.0106, over 3042870.84 frames. ], batch size: 60, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:49:56,082 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=764733.3333333334, ans=0.0 2023-11-19 13:50:06,018 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=764800.0, ans=0.0 2023-11-19 13:50:13,160 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.395e+01 9.151e+01 9.992e+01 1.336e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 13:50:31,871 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=764933.3333333334, ans=0.0 2023-11-19 13:50:33,855 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=764933.3333333334, ans=0.0 2023-11-19 13:50:39,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765000.0, ans=0.1 2023-11-19 13:50:50,145 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6550, loss[loss=0.1015, simple_loss=0.1256, pruned_loss=0.02918, audio_tagging_loss=0.009488, over 14459.00 frames. ], tot_loss[loss=0.08533, simple_loss=0.1047, pruned_loss=0.0226, audio_tagging_loss=0.01036, over 3048087.39 frames. ], batch size: 54, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:51:11,565 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.87 vs. limit=22.5 2023-11-19 13:51:27,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=765266.6666666666, ans=0.1 2023-11-19 13:51:35,276 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:51:40,501 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=765333.3333333334, ans=0.0 2023-11-19 13:51:45,099 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6600, loss[loss=0.0944, simple_loss=0.1205, pruned_loss=0.02661, audio_tagging_loss=0.007567, over 14204.00 frames. ], tot_loss[loss=0.08557, simple_loss=0.1051, pruned_loss=0.02279, audio_tagging_loss=0.01024, over 3045583.42 frames. ], batch size: 54, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:51:58,711 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.52 vs. limit=15.0 2023-11-19 13:52:01,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=765466.6666666666, ans=0.2 2023-11-19 13:52:03,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 7.948e+01 8.762e+01 9.685e+01 1.504e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-19 13:52:18,944 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=765600.0, ans=0.125 2023-11-19 13:52:27,968 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=765666.6666666666, ans=0.0 2023-11-19 13:52:40,466 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6650, loss[loss=0.08724, simple_loss=0.1121, pruned_loss=0.02362, audio_tagging_loss=0.00756, over 16054.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1052, pruned_loss=0.02294, audio_tagging_loss=0.01029, over 3039643.80 frames. ], batch size: 60, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:52:44,868 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=765733.3333333334, ans=0.02 2023-11-19 13:53:00,160 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=765800.0, ans=0.125 2023-11-19 13:53:02,296 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=765866.6666666666, ans=0.125 2023-11-19 13:53:02,579 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2023-11-19 13:53:11,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765866.6666666666, ans=0.1 2023-11-19 13:53:35,928 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6700, loss[loss=0.08844, simple_loss=0.1059, pruned_loss=0.02387, audio_tagging_loss=0.0116, over 15443.00 frames. ], tot_loss[loss=0.08514, simple_loss=0.1042, pruned_loss=0.02263, audio_tagging_loss=0.01042, over 3041629.77 frames. ], batch size: 58, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 13:53:45,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=766133.3333333334, ans=0.0 2023-11-19 13:53:55,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.241e+01 9.083e+01 1.005e+02 1.420e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 13:54:16,410 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=766266.6666666666, ans=0.0 2023-11-19 13:54:16,420 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=766266.6666666666, ans=0.125 2023-11-19 13:54:21,824 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=766333.3333333334, ans=0.0 2023-11-19 13:54:31,096 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6750, loss[loss=0.06067, simple_loss=0.07793, pruned_loss=0.01322, audio_tagging_loss=0.008475, over 14824.00 frames. ], tot_loss[loss=0.0853, simple_loss=0.1045, pruned_loss=0.02268, audio_tagging_loss=0.01037, over 3037310.19 frames. ], batch size: 54, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 13:54:36,043 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=766400.0, ans=0.125 2023-11-19 13:54:37,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=766400.0, ans=0.125 2023-11-19 13:55:27,975 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6800, loss[loss=0.0801, simple_loss=0.09406, pruned_loss=0.02184, audio_tagging_loss=0.01123, over 15051.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1046, pruned_loss=0.02265, audio_tagging_loss=0.01025, over 3042923.41 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:55:28,155 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=766733.3333333334, ans=0.0 2023-11-19 13:55:29,735 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.91 vs. limit=10.0 2023-11-19 13:55:33,505 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:55:43,476 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=766800.0, ans=0.125 2023-11-19 13:55:44,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=766800.0, ans=0.0 2023-11-19 13:55:46,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.286e+01 8.985e+01 9.839e+01 1.456e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 13:55:46,720 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:55:53,786 INFO [scaling.py:1022] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.46 vs. limit=5.0 2023-11-19 13:56:23,562 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6850, loss[loss=0.0919, simple_loss=0.1109, pruned_loss=0.02576, audio_tagging_loss=0.01068, over 15562.00 frames. ], tot_loss[loss=0.08574, simple_loss=0.1054, pruned_loss=0.0228, audio_tagging_loss=0.01023, over 3034538.29 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:56:33,389 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=767133.3333333334, ans=0.125 2023-11-19 13:56:35,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767133.3333333334, ans=0.1 2023-11-19 13:56:43,260 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=767133.3333333334, ans=0.125 2023-11-19 13:56:46,035 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.350e-02 2023-11-19 13:57:08,428 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=767333.3333333334, ans=0.125 2023-11-19 13:57:13,592 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=767333.3333333334, ans=0.125 2023-11-19 13:57:18,753 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6900, loss[loss=0.07598, simple_loss=0.09135, pruned_loss=0.02026, audio_tagging_loss=0.01004, over 15071.00 frames. ], tot_loss[loss=0.08574, simple_loss=0.1056, pruned_loss=0.02281, audio_tagging_loss=0.01016, over 3034858.91 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:57:21,195 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=767400.0, ans=0.125 2023-11-19 13:57:25,837 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=767400.0, ans=0.0 2023-11-19 13:57:27,452 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=767400.0, ans=0.04949747468305833 2023-11-19 13:57:37,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.298e+01 8.115e+01 8.697e+01 9.342e+01 1.240e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-19 13:57:38,489 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.42 vs. limit=12.0 2023-11-19 13:58:00,772 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:58:12,640 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=767666.6666666666, ans=0.125 2023-11-19 13:58:14,579 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 6950, loss[loss=0.07114, simple_loss=0.08803, pruned_loss=0.01464, audio_tagging_loss=0.01248, over 14708.00 frames. ], tot_loss[loss=0.08574, simple_loss=0.1055, pruned_loss=0.02274, audio_tagging_loss=0.01025, over 3037780.65 frames. ], batch size: 55, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:58:15,854 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=767733.3333333334, ans=0.07 2023-11-19 13:58:17,332 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2023-11-19 13:58:20,047 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=767733.3333333334, ans=0.125 2023-11-19 13:58:42,286 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=767866.6666666666, ans=0.125 2023-11-19 13:59:09,886 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=768066.6666666666, ans=0.2 2023-11-19 13:59:10,777 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7000, loss[loss=0.06876, simple_loss=0.0765, pruned_loss=0.0158, audio_tagging_loss=0.01471, over 15665.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1055, pruned_loss=0.02282, audio_tagging_loss=0.01034, over 3040498.20 frames. ], batch size: 62, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 13:59:18,412 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=768066.6666666666, ans=0.125 2023-11-19 13:59:29,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.706e+01 8.254e+01 9.022e+01 1.015e+02 1.308e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 13:59:38,712 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2023-11-19 13:59:44,094 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-19 13:59:45,647 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=768266.6666666666, ans=0.0 2023-11-19 13:59:57,750 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=768333.3333333334, ans=0.95 2023-11-19 14:00:04,857 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7050, loss[loss=0.08962, simple_loss=0.1064, pruned_loss=0.02474, audio_tagging_loss=0.01165, over 15433.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1051, pruned_loss=0.02281, audio_tagging_loss=0.01043, over 3039826.87 frames. ], batch size: 58, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:00:06,308 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=12.0 2023-11-19 14:00:18,612 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=768466.6666666666, ans=0.0 2023-11-19 14:00:43,897 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-19 14:00:51,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=768666.6666666666, ans=0.1 2023-11-19 14:01:00,280 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7100, loss[loss=0.07062, simple_loss=0.08226, pruned_loss=0.01811, audio_tagging_loss=0.01138, over 14769.00 frames. ], tot_loss[loss=0.0857, simple_loss=0.1047, pruned_loss=0.02278, audio_tagging_loss=0.01055, over 3042066.25 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:01:13,644 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=768800.0, ans=0.125 2023-11-19 14:01:19,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.436e+01 9.096e+01 9.960e+01 1.200e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-19 14:01:29,019 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=768866.6666666666, ans=0.1 2023-11-19 14:01:51,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.44 vs. limit=22.5 2023-11-19 14:01:51,581 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=769000.0, ans=0.125 2023-11-19 14:01:56,309 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7150, loss[loss=0.09375, simple_loss=0.1159, pruned_loss=0.02407, audio_tagging_loss=0.01174, over 15663.00 frames. ], tot_loss[loss=0.086, simple_loss=0.1049, pruned_loss=0.02285, audio_tagging_loss=0.01071, over 3042043.11 frames. ], batch size: 58, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:02:04,579 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=769066.6666666666, ans=0.125 2023-11-19 14:02:52,145 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7200, loss[loss=0.09255, simple_loss=0.1146, pruned_loss=0.02651, audio_tagging_loss=0.00874, over 15737.00 frames. ], tot_loss[loss=0.08536, simple_loss=0.1039, pruned_loss=0.02262, audio_tagging_loss=0.01077, over 3043823.14 frames. ], batch size: 58, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:02:53,342 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=769400.0, ans=0.025 2023-11-19 14:03:02,525 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=769466.6666666666, ans=0.0 2023-11-19 14:03:11,329 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.379e+01 9.176e+01 1.015e+02 1.604e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 14:03:18,020 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=769533.3333333334, ans=0.1 2023-11-19 14:03:21,790 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=769533.3333333334, ans=0.0 2023-11-19 14:03:22,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=769533.3333333334, ans=0.125 2023-11-19 14:03:24,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=769600.0, ans=0.125 2023-11-19 14:03:48,454 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7250, loss[loss=0.08503, simple_loss=0.0998, pruned_loss=0.02225, audio_tagging_loss=0.01288, over 14865.00 frames. ], tot_loss[loss=0.0855, simple_loss=0.1042, pruned_loss=0.02261, audio_tagging_loss=0.01081, over 3045905.45 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:04:02,883 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=769800.0, ans=0.125 2023-11-19 14:04:07,245 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=769800.0, ans=0.0 2023-11-19 14:04:12,027 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=769866.6666666666, ans=0.125 2023-11-19 14:04:25,800 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.12 vs. limit=10.0 2023-11-19 14:04:43,484 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7300, loss[loss=0.0656, simple_loss=0.0791, pruned_loss=0.01724, audio_tagging_loss=0.008815, over 14692.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.1041, pruned_loss=0.02251, audio_tagging_loss=0.01075, over 3044348.48 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:04:59,017 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2023-11-19 14:05:02,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.433e+01 8.339e+01 9.048e+01 1.014e+02 1.411e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 14:05:25,184 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=770266.6666666666, ans=0.1 2023-11-19 14:05:32,103 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=770333.3333333334, ans=0.0 2023-11-19 14:05:32,118 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:05:37,402 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=770333.3333333334, ans=0.1 2023-11-19 14:05:38,983 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=770400.0, ans=0.0 2023-11-19 14:05:39,783 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7350, loss[loss=0.1039, simple_loss=0.1328, pruned_loss=0.02907, audio_tagging_loss=0.008402, over 15689.00 frames. ], tot_loss[loss=0.08485, simple_loss=0.1038, pruned_loss=0.02238, audio_tagging_loss=0.01057, over 3043693.87 frames. ], batch size: 56, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:05:44,549 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.26 vs. limit=15.0 2023-11-19 14:05:47,026 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=770400.0, ans=0.2 2023-11-19 14:05:53,727 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=770466.6666666666, ans=0.125 2023-11-19 14:05:59,253 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=770466.6666666666, ans=0.2 2023-11-19 14:06:01,330 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=770533.3333333334, ans=0.95 2023-11-19 14:06:13,543 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=770600.0, ans=0.125 2023-11-19 14:06:36,272 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7400, loss[loss=0.1096, simple_loss=0.1318, pruned_loss=0.03576, audio_tagging_loss=0.007968, over 14401.00 frames. ], tot_loss[loss=0.0843, simple_loss=0.1033, pruned_loss=0.02224, audio_tagging_loss=0.01042, over 3042916.91 frames. ], batch size: 53, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:06:40,708 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=770733.3333333334, ans=0.05 2023-11-19 14:06:54,608 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.661e+01 9.537e+01 1.042e+02 1.641e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-19 14:06:59,097 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=770866.6666666666, ans=0.1 2023-11-19 14:06:59,099 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=770866.6666666666, ans=0.0 2023-11-19 14:07:14,367 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.36 vs. limit=12.0 2023-11-19 14:07:23,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=771000.0, ans=0.125 2023-11-19 14:07:31,157 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7450, loss[loss=0.09112, simple_loss=0.1145, pruned_loss=0.02483, audio_tagging_loss=0.009014, over 16203.00 frames. ], tot_loss[loss=0.08422, simple_loss=0.1032, pruned_loss=0.02222, audio_tagging_loss=0.01038, over 3035065.16 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:07:55,635 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-19 14:08:05,363 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=771266.6666666666, ans=0.125 2023-11-19 14:08:07,292 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=771266.6666666666, ans=0.0 2023-11-19 14:08:10,532 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=771266.6666666666, ans=0.035 2023-11-19 14:08:26,734 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7500, loss[loss=0.07399, simple_loss=0.09484, pruned_loss=0.01497, audio_tagging_loss=0.01159, over 16175.00 frames. ], tot_loss[loss=0.08439, simple_loss=0.1036, pruned_loss=0.02231, audio_tagging_loss=0.01028, over 3044582.26 frames. ], batch size: 60, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:08:36,513 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=771400.0, ans=0.0 2023-11-19 14:08:43,409 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=771466.6666666666, ans=0.2 2023-11-19 14:08:46,250 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.223e+01 8.899e+01 9.673e+01 1.181e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 14:09:21,768 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=771733.3333333334, ans=0.125 2023-11-19 14:09:23,262 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7550, loss[loss=0.07484, simple_loss=0.08916, pruned_loss=0.01823, audio_tagging_loss=0.01203, over 15924.00 frames. ], tot_loss[loss=0.08407, simple_loss=0.1033, pruned_loss=0.02224, audio_tagging_loss=0.0102, over 3045806.75 frames. ], batch size: 60, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:09:28,749 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=771733.3333333334, ans=0.2 2023-11-19 14:09:45,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771866.6666666666, ans=0.1 2023-11-19 14:09:54,585 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=771933.3333333334, ans=0.07 2023-11-19 14:09:55,721 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=771933.3333333334, ans=0.125 2023-11-19 14:10:09,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=772000.0, ans=0.05 2023-11-19 14:10:14,942 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=772000.0, ans=0.125 2023-11-19 14:10:17,843 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7600, loss[loss=0.08737, simple_loss=0.1033, pruned_loss=0.02346, audio_tagging_loss=0.01225, over 14374.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.1045, pruned_loss=0.02273, audio_tagging_loss=0.01023, over 3048670.32 frames. ], batch size: 55, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:10:36,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.387e+01 9.154e+01 1.016e+02 1.447e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 14:10:37,450 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=772133.3333333334, ans=0.05 2023-11-19 14:11:00,845 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2023-11-19 14:11:02,595 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=772333.3333333334, ans=0.125 2023-11-19 14:11:11,942 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.33 vs. limit=10.0 2023-11-19 14:11:13,460 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7650, loss[loss=0.09606, simple_loss=0.1297, pruned_loss=0.02401, audio_tagging_loss=0.007206, over 16914.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1045, pruned_loss=0.02255, audio_tagging_loss=0.01021, over 3043572.28 frames. ], batch size: 61, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 14:11:16,193 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-19 14:11:30,094 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=772466.6666666666, ans=0.0 2023-11-19 14:11:55,311 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.71 vs. limit=15.0 2023-11-19 14:12:06,860 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=772666.6666666666, ans=0.125 2023-11-19 14:12:08,891 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7700, loss[loss=0.08149, simple_loss=0.09831, pruned_loss=0.0179, audio_tagging_loss=0.01444, over 15820.00 frames. ], tot_loss[loss=0.08492, simple_loss=0.1045, pruned_loss=0.0225, audio_tagging_loss=0.01019, over 3041793.85 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:12:27,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772800.0, ans=0.1 2023-11-19 14:12:29,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.040e+01 8.605e+01 9.398e+01 1.279e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-19 14:12:37,053 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=772866.6666666666, ans=0.0 2023-11-19 14:12:53,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=773000.0, ans=0.125 2023-11-19 14:13:04,322 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7750, loss[loss=0.06974, simple_loss=0.07881, pruned_loss=0.01904, audio_tagging_loss=0.0113, over 14246.00 frames. ], tot_loss[loss=0.08509, simple_loss=0.1045, pruned_loss=0.02258, audio_tagging_loss=0.01025, over 3038533.62 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:13:11,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2023-11-19 14:13:21,995 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=773133.3333333334, ans=0.125 2023-11-19 14:13:25,543 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2023-11-19 14:13:29,368 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=773200.0, ans=0.125 2023-11-19 14:13:30,957 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=773200.0, ans=0.0 2023-11-19 14:13:33,141 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=773200.0, ans=0.1 2023-11-19 14:13:45,699 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=773266.6666666666, ans=0.2 2023-11-19 14:14:01,956 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7800, loss[loss=0.07372, simple_loss=0.09202, pruned_loss=0.01959, audio_tagging_loss=0.008115, over 14210.00 frames. ], tot_loss[loss=0.08637, simple_loss=0.106, pruned_loss=0.02307, audio_tagging_loss=0.0103, over 3041897.04 frames. ], batch size: 54, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:14:23,104 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 8.739e+01 9.425e+01 1.048e+02 2.167e+02, threshold=1.885e+02, percent-clipped=1.0 2023-11-19 14:14:28,765 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=773533.3333333334, ans=0.0 2023-11-19 14:14:37,670 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773600.0, ans=0.1 2023-11-19 14:14:39,748 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=773600.0, ans=0.0 2023-11-19 14:14:50,411 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=773666.6666666666, ans=0.05 2023-11-19 14:14:57,119 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7850, loss[loss=0.08384, simple_loss=0.1078, pruned_loss=0.02151, audio_tagging_loss=0.008421, over 15241.00 frames. ], tot_loss[loss=0.08633, simple_loss=0.1058, pruned_loss=0.02305, audio_tagging_loss=0.01039, over 3044912.73 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:15:09,326 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=15.0 2023-11-19 14:15:11,288 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.06 vs. limit=12.0 2023-11-19 14:15:28,304 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=773866.6666666666, ans=0.2 2023-11-19 14:15:28,480 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=773866.6666666666, ans=0.125 2023-11-19 14:15:29,715 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2023-11-19 14:15:53,097 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7900, loss[loss=0.08861, simple_loss=0.1076, pruned_loss=0.02202, audio_tagging_loss=0.01277, over 15805.00 frames. ], tot_loss[loss=0.08673, simple_loss=0.1061, pruned_loss=0.02315, audio_tagging_loss=0.01052, over 3055964.67 frames. ], batch size: 59, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:15:59,604 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=774066.6666666666, ans=0.125 2023-11-19 14:16:02,125 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=774066.6666666666, ans=0.0 2023-11-19 14:16:12,616 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=774133.3333333334, ans=0.125 2023-11-19 14:16:13,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.211e+01 8.938e+01 9.745e+01 1.285e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 14:16:16,846 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774200.0, ans=0.1 2023-11-19 14:16:22,602 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774200.0, ans=0.1 2023-11-19 14:16:29,726 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=774266.6666666666, ans=22.5 2023-11-19 14:16:48,182 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 7950, loss[loss=0.06421, simple_loss=0.0772, pruned_loss=0.01328, audio_tagging_loss=0.01234, over 14673.00 frames. ], tot_loss[loss=0.08682, simple_loss=0.1063, pruned_loss=0.02304, audio_tagging_loss=0.01063, over 3055870.43 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:16:51,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=774400.0, ans=0.0 2023-11-19 14:16:54,762 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=774400.0, ans=0.0 2023-11-19 14:16:59,894 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:17:19,698 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=774533.3333333334, ans=0.125 2023-11-19 14:17:43,696 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8000, loss[loss=0.07555, simple_loss=0.09042, pruned_loss=0.01875, audio_tagging_loss=0.0116, over 16319.00 frames. ], tot_loss[loss=0.08565, simple_loss=0.1045, pruned_loss=0.02265, audio_tagging_loss=0.01075, over 3052887.70 frames. ], batch size: 63, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:18:05,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.348e+01 9.114e+01 9.901e+01 1.524e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 14:18:10,526 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.99 vs. limit=22.5 2023-11-19 14:18:18,681 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=774933.3333333334, ans=0.125 2023-11-19 14:18:22,999 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=774933.3333333334, ans=0.0 2023-11-19 14:18:25,142 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=774933.3333333334, ans=0.125 2023-11-19 14:18:33,751 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=775000.0, ans=0.0 2023-11-19 14:18:39,446 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.07 vs. limit=15.0 2023-11-19 14:18:39,864 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8050, loss[loss=0.1121, simple_loss=0.1398, pruned_loss=0.0313, audio_tagging_loss=0.01092, over 14471.00 frames. ], tot_loss[loss=0.0868, simple_loss=0.1059, pruned_loss=0.02313, audio_tagging_loss=0.01072, over 3059420.41 frames. ], batch size: 54, lr: 6.87e-03, grad_scale: 16.0 2023-11-19 14:18:47,090 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=775066.6666666666, ans=0.1 2023-11-19 14:18:55,011 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775133.3333333334, ans=0.1 2023-11-19 14:19:17,139 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=775266.6666666666, ans=0.0 2023-11-19 14:19:21,344 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=775266.6666666666, ans=0.125 2023-11-19 14:19:23,499 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775333.3333333334, ans=0.1 2023-11-19 14:19:26,592 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-19 14:19:36,026 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8100, loss[loss=0.09999, simple_loss=0.1349, pruned_loss=0.02553, audio_tagging_loss=0.007025, over 16027.00 frames. ], tot_loss[loss=0.08624, simple_loss=0.1052, pruned_loss=0.02304, audio_tagging_loss=0.0106, over 3056575.72 frames. ], batch size: 56, lr: 6.87e-03, grad_scale: 16.0 2023-11-19 14:19:51,119 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=775466.6666666666, ans=0.0 2023-11-19 14:19:56,602 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.298e+01 8.945e+01 9.680e+01 1.266e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-19 14:20:07,250 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=775533.3333333334, ans=0.125 2023-11-19 14:20:12,072 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=775600.0, ans=0.125 2023-11-19 14:20:19,850 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=775666.6666666666, ans=0.0 2023-11-19 14:20:23,399 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-11-19 14:20:26,259 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=775666.6666666666, ans=0.125 2023-11-19 14:20:31,234 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8150, loss[loss=0.0772, simple_loss=0.09257, pruned_loss=0.01996, audio_tagging_loss=0.01095, over 15315.00 frames. ], tot_loss[loss=0.08632, simple_loss=0.1055, pruned_loss=0.02319, audio_tagging_loss=0.01038, over 3053418.84 frames. ], batch size: 59, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:21:12,866 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=775933.3333333334, ans=0.125 2023-11-19 14:21:24,807 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=776000.0, ans=0.125 2023-11-19 14:21:26,667 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:21:26,908 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=776066.6666666666, ans=0.0 2023-11-19 14:21:27,678 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8200, loss[loss=0.08307, simple_loss=0.1155, pruned_loss=0.01876, audio_tagging_loss=0.00655, over 15143.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1059, pruned_loss=0.02322, audio_tagging_loss=0.0102, over 3053614.37 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:21:33,244 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=776066.6666666666, ans=0.07 2023-11-19 14:21:49,062 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=776200.0, ans=0.0 2023-11-19 14:21:49,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.524e+01 9.249e+01 1.061e+02 1.477e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 14:22:23,481 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8250, loss[loss=0.0775, simple_loss=0.09777, pruned_loss=0.01708, audio_tagging_loss=0.01153, over 15542.00 frames. ], tot_loss[loss=0.08635, simple_loss=0.1059, pruned_loss=0.02322, audio_tagging_loss=0.01021, over 3052232.15 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:22:26,782 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=776400.0, ans=0.0 2023-11-19 14:22:30,363 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-11-19 14:22:44,278 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=776533.3333333334, ans=0.0 2023-11-19 14:23:16,528 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=776666.6666666666, ans=0.1 2023-11-19 14:23:18,347 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8300, loss[loss=0.07769, simple_loss=0.08554, pruned_loss=0.02338, audio_tagging_loss=0.01154, over 14592.00 frames. ], tot_loss[loss=0.0855, simple_loss=0.1046, pruned_loss=0.0229, audio_tagging_loss=0.01029, over 3049574.92 frames. ], batch size: 59, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:23:21,677 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=776733.3333333334, ans=0.0 2023-11-19 14:23:35,309 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=776800.0, ans=0.07 2023-11-19 14:23:36,618 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.36 vs. limit=15.0 2023-11-19 14:23:40,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.122e+01 8.080e+01 8.886e+01 9.811e+01 1.458e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 14:23:42,657 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=776866.6666666666, ans=0.0 2023-11-19 14:23:44,792 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=776866.6666666666, ans=0.0 2023-11-19 14:24:14,330 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8350, loss[loss=0.06552, simple_loss=0.08362, pruned_loss=0.01441, audio_tagging_loss=0.009297, over 15948.00 frames. ], tot_loss[loss=0.08513, simple_loss=0.1044, pruned_loss=0.02276, audio_tagging_loss=0.01018, over 3052111.34 frames. ], batch size: 60, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:24:20,454 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-19 14:24:31,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=777133.3333333334, ans=0.0 2023-11-19 14:24:34,486 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=777133.3333333334, ans=0.015 2023-11-19 14:24:47,080 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=777266.6666666666, ans=0.125 2023-11-19 14:24:48,722 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=777266.6666666666, ans=0.0 2023-11-19 14:24:49,037 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=12.0 2023-11-19 14:24:49,752 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=777266.6666666666, ans=0.125 2023-11-19 14:25:00,045 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=777333.3333333334, ans=0.0 2023-11-19 14:25:01,066 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=777333.3333333334, ans=0.125 2023-11-19 14:25:07,417 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=777333.3333333334, ans=0.125 2023-11-19 14:25:09,775 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8400, loss[loss=0.1054, simple_loss=0.1375, pruned_loss=0.02983, audio_tagging_loss=0.006861, over 15765.00 frames. ], tot_loss[loss=0.08452, simple_loss=0.1035, pruned_loss=0.02251, audio_tagging_loss=0.01027, over 3050839.70 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:25:16,956 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=777400.0, ans=0.0 2023-11-19 14:25:27,548 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=777466.6666666666, ans=0.125 2023-11-19 14:25:32,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.445e+01 9.359e+01 1.034e+02 1.708e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 14:25:38,407 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.63 vs. limit=22.5 2023-11-19 14:25:43,374 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=777600.0, ans=0.125 2023-11-19 14:25:44,586 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=777600.0, ans=0.07 2023-11-19 14:25:52,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=777600.0, ans=0.125 2023-11-19 14:26:05,411 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8450, loss[loss=0.1045, simple_loss=0.129, pruned_loss=0.0311, audio_tagging_loss=0.008922, over 14799.00 frames. ], tot_loss[loss=0.08532, simple_loss=0.1044, pruned_loss=0.02275, audio_tagging_loss=0.01035, over 3046519.64 frames. ], batch size: 53, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:26:14,855 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.87 vs. limit=6.0 2023-11-19 14:26:22,716 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=777800.0, ans=0.125 2023-11-19 14:26:43,639 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=777933.3333333334, ans=0.125 2023-11-19 14:26:54,275 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=778000.0, ans=0.0 2023-11-19 14:26:57,965 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778000.0, ans=0.1 2023-11-19 14:27:00,430 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=778066.6666666666, ans=0.2 2023-11-19 14:27:01,313 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8500, loss[loss=0.08814, simple_loss=0.1156, pruned_loss=0.0226, audio_tagging_loss=0.007727, over 15688.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.1052, pruned_loss=0.02281, audio_tagging_loss=0.0103, over 3053223.70 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:27:17,524 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=778133.3333333334, ans=0.125 2023-11-19 14:27:19,761 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=778133.3333333334, ans=0.125 2023-11-19 14:27:24,292 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.947e+01 8.482e+01 9.526e+01 1.059e+02 1.313e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 14:27:25,573 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=778200.0, ans=0.125 2023-11-19 14:27:56,057 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8550, loss[loss=0.07246, simple_loss=0.08156, pruned_loss=0.01912, audio_tagging_loss=0.01256, over 15464.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.1046, pruned_loss=0.02265, audio_tagging_loss=0.01035, over 3056791.63 frames. ], batch size: 57, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:27:56,297 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=778400.0, ans=0.0 2023-11-19 14:28:14,176 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=778466.6666666666, ans=0.0 2023-11-19 14:28:27,073 INFO [scaling.py:1118] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.935e-02 2023-11-19 14:28:43,955 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2023-11-19 14:28:50,181 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=778666.6666666666, ans=0.0 2023-11-19 14:28:52,521 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8600, loss[loss=0.1109, simple_loss=0.1427, pruned_loss=0.03151, audio_tagging_loss=0.008095, over 16104.00 frames. ], tot_loss[loss=0.08548, simple_loss=0.1048, pruned_loss=0.02266, audio_tagging_loss=0.01042, over 3056467.44 frames. ], batch size: 59, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:29:15,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 5.831e+01 8.133e+01 8.812e+01 9.832e+01 1.513e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 14:29:17,924 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=778866.6666666666, ans=0.0 2023-11-19 14:29:19,122 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=778866.6666666666, ans=0.125 2023-11-19 14:29:27,345 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=12.0 2023-11-19 14:29:42,165 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=779000.0, ans=0.1 2023-11-19 14:29:43,114 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=779000.0, ans=0.0 2023-11-19 14:29:47,748 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8650, loss[loss=0.06424, simple_loss=0.07439, pruned_loss=0.01738, audio_tagging_loss=0.009665, over 14887.00 frames. ], tot_loss[loss=0.0852, simple_loss=0.1045, pruned_loss=0.02244, audio_tagging_loss=0.0105, over 3053183.10 frames. ], batch size: 57, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:30:26,241 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=779266.6666666666, ans=0.0 2023-11-19 14:30:43,883 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8700, loss[loss=0.1147, simple_loss=0.1376, pruned_loss=0.03489, audio_tagging_loss=0.01099, over 15208.00 frames. ], tot_loss[loss=0.0855, simple_loss=0.1046, pruned_loss=0.0227, audio_tagging_loss=0.01051, over 3054178.60 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 8.0 2023-11-19 14:30:44,199 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=779400.0, ans=0.2 2023-11-19 14:30:51,166 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779400.0, ans=0.1 2023-11-19 14:31:07,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.174e+01 9.057e+01 1.021e+02 2.200e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-19 14:31:25,945 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=779600.0, ans=0.125 2023-11-19 14:31:39,901 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8750, loss[loss=0.09803, simple_loss=0.1141, pruned_loss=0.02951, audio_tagging_loss=0.01149, over 14728.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1045, pruned_loss=0.02266, audio_tagging_loss=0.01061, over 3057056.58 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 8.0 2023-11-19 14:31:50,303 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=779800.0, ans=0.05 2023-11-19 14:32:05,252 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2023-11-19 14:32:32,710 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=780000.0, ans=0.2 2023-11-19 14:32:36,128 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8800, loss[loss=0.1061, simple_loss=0.1297, pruned_loss=0.02931, audio_tagging_loss=0.01199, over 15346.00 frames. ], tot_loss[loss=0.0862, simple_loss=0.1055, pruned_loss=0.02269, audio_tagging_loss=0.01073, over 3051616.56 frames. ], batch size: 57, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:32:44,232 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=780066.6666666666, ans=0.0 2023-11-19 14:32:45,461 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=780066.6666666666, ans=0.0 2023-11-19 14:32:57,537 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=780200.0, ans=0.125 2023-11-19 14:32:58,815 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=780200.0, ans=22.5 2023-11-19 14:32:59,411 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.658e+01 9.293e+01 1.017e+02 2.957e+02, threshold=1.859e+02, percent-clipped=2.0 2023-11-19 14:33:08,396 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.88 vs. limit=15.0 2023-11-19 14:33:20,646 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=780333.3333333334, ans=0.035 2023-11-19 14:33:24,503 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=780333.3333333334, ans=0.0 2023-11-19 14:33:31,585 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8850, loss[loss=0.06921, simple_loss=0.08417, pruned_loss=0.01589, audio_tagging_loss=0.01124, over 15200.00 frames. ], tot_loss[loss=0.08621, simple_loss=0.1056, pruned_loss=0.02271, audio_tagging_loss=0.0107, over 3052330.84 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:33:40,483 WARNING [train_asr.py:1319] (3/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:33:53,270 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=780533.3333333334, ans=0.0 2023-11-19 14:33:53,529 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2023-11-19 14:33:59,986 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=15.0 2023-11-19 14:34:00,488 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=780533.3333333334, ans=0.125 2023-11-19 14:34:26,784 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8900, loss[loss=0.07402, simple_loss=0.08198, pruned_loss=0.01785, audio_tagging_loss=0.01517, over 15157.00 frames. ], tot_loss[loss=0.08605, simple_loss=0.1058, pruned_loss=0.02265, audio_tagging_loss=0.01051, over 3057807.25 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:34:48,518 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=780866.6666666666, ans=0.125 2023-11-19 14:34:48,613 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=780866.6666666666, ans=0.2 2023-11-19 14:34:50,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 6.845e+01 8.412e+01 9.119e+01 1.005e+02 1.451e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 14:35:00,734 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=780933.3333333334, ans=0.125 2023-11-19 14:35:11,545 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=781000.0, ans=0.2 2023-11-19 14:35:15,620 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=781000.0, ans=0.125 2023-11-19 14:35:16,851 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=781000.0, ans=0.04949747468305833 2023-11-19 14:35:22,323 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 8950, loss[loss=0.08507, simple_loss=0.1143, pruned_loss=0.02255, audio_tagging_loss=0.005357, over 15468.00 frames. ], tot_loss[loss=0.08649, simple_loss=0.1067, pruned_loss=0.02291, audio_tagging_loss=0.01025, over 3053066.44 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:35:29,433 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=781066.6666666666, ans=0.0 2023-11-19 14:35:43,608 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=781200.0, ans=0.125 2023-11-19 14:36:00,815 INFO [scaling.py:1022] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-11-19 14:36:12,022 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781333.3333333334, ans=0.1 2023-11-19 14:36:18,253 INFO [train_asr.py:1115] (3/4) Epoch 10, batch 9000, loss[loss=0.07905, simple_loss=0.09346, pruned_loss=0.01925, audio_tagging_loss=0.01307, over 15326.00 frames. ], tot_loss[loss=0.08717, simple_loss=0.1074, pruned_loss=0.02328, audio_tagging_loss=0.01017, over 3055956.84 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:36:18,254 INFO [train_asr.py:1138] (3/4) Computing validation loss 2023-11-19 14:36:58,292 INFO [train_asr.py:1147] (3/4) Epoch 10, validation: loss=0.06535, simple_loss=0.05527, pruned_loss=0.006386, audio_tagging_loss=0.03133, over 4681554.00 frames. 2023-11-19 14:36:58,293 INFO [train_asr.py:1148] (3/4) Maximum memory allocated so far is 25390MB 2023-11-19 14:37:01,858 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=781400.0, ans=0.1 2023-11-19 14:37:06,226 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=781400.0, ans=0.0 2023-11-19 14:37:19,641 INFO [scaling.py:213] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=781466.6666666666, ans=0.0 2023-11-19 14:37:29,704 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.251e+01 8.814e+01 9.719e+01 1.451e+02, threshold=1.763e+02, percent-clipped=0.0